Numerical calculations of the pH of maximal protein stability
The effect of the sequence composition and three-dimensional structure
Emil Alexov
Howard Hughes Medical Institute and Columbia University, Biochemistry Department, New York, USA
A large number of proteins, found experimentally to have
different optimum pH of maximal stability, were studied to
reveal the basic principles of their preferenence for a par-
ticular pH. The pH-dependent free energy of folding was
modeled numerically as a function of pH as well as the net
charge of the protein. The optimum pH was determined in
the numerical calculations as the pH of the minimum free
energy of folding. The experimental data for the pH of
maximal stability (experimental optimum pH) was repro-
ducible (rmsd ¼0.73). It was shown that the optimum pH
results from two factors amino acid composition and the
organization of the titratable groups with the 3D structure.
It was demonstrated that the optimum pH and isoelectric
point could be quite different. In many cases, the optimum
pH was found at a pH corresponding to a large net charge of
the protein. At the same time, there was a tendency for
proteins having acidic optimum pHs to have a base/acid
ratio smaller than one and vice versa. The correlation
between the optimum pH and base/acid ratio is significant if
only buried groups are taken into account. It was shown that
a protein that provides a favorable electrostatic environment
for acids and disfavors the bases tends to have high optimum
pH and vice versa.
Keywords: electrostatics; pH stability; pK
a
; optimum pH.
The concentration of hydrogen ions (pH) is an important
factor that affects protein function and stability in different
locations in the cell and in the body [1]. Physiological pH
varies in different organs in human body: the pH in the
digestive tract ranges from 1.5 to 7.0, in the kidney it ranges
from 4.5 to 8.0, and body liquids have a pH of 7.2–7.4 [2]. It
was shown that the interstitial fluid of solid tumors have
pH ¼6.5–6.8, which differs from the physiological pH of
normal tissue and thus can be used for the design of pH
selective drugs [3].
The structure and function of most macromolecules are
influenced by pH, and most proteins operate optimally at a
particular pH (optimum pH) [4]. On the basis of indirect
measurements, it has been found that the intracellular pH
usually ranges between 4.5 and 7.4 in different cells [5]. The
organelles’ pH affects protein function and variation of pH
away from normal could be responsible for drug resistance
[6]. Lysosomal enzymes function best at the low pH of 5
found in lysosomes, whereas cytosolic enzymes function
best at the close to neutral pH of 7.2 [1].
Experimental studies of pH-dependent properties [7–11]
such as stability, solubility and activity, provide the benchmarks
for numerical simulation. Experiments revealed that altho-
ugh the net charge of ribonuclease Sa does affect the
solubility, it does not affect the pH of maximal stability or
activity [12]. Another experimental technique as acidic or
basic denaturation [13–15] demonstrates the importance of
electrostatic interactions on protein stability.
pH-dependent phenomena have been extensively mode-
led using numerical approaches [16–19]. A typical task is to
compute the pK
a
s of ionizable groups [20–26], the isoelectric
point [27,28] or the electrostatic potential distribution
around the active site [29]. It was shown that activity of
nine lipases correlates with the pH dependence of the
electrostatic potential mapped on the molecular surface of
the molecules [29]. pH dependence of unfolding energy was
modeled extensively and the models reproduced reasonable
the experimental denaturation free energy as a function of
pH [19,30–36].
The success of the numerical protocol to compute the
pH dependence of the free energy depends on the model
of the unfolded state, the model of folded state and thus
on the calculated pK
a
s. It is well recognized that the
unfolded state is compact and native-like, but the magni-
tude of the residual pairwise interactions and the desol-
vation energies has been debated. Some of the studies
found that any residual structure of the unfolded state has
negligible effect on the calculated pH dependence of
unfolding free energy [31], while others found the opposite
[33–36]. It was estimated that the pK
a
s of the acidic
groups in unfolded state are shifted by 0.3 pK units in
respect to the pK
a
s of model compounds. Although
including the measured and simulated pK shifts into the
model of unfolded state changes the pH dependence of
the unfolding free energy, it most of the cases it does not
change the pH of maximal stability [33–36]. Much more
Correspondence to E. Alexov, Howard Hughes Medical Institute and
Columbia University, Biochemistry Department, 630 W 168 Street,
New York, NY 10032, USA.
Fax: + 1 212 305 6926, Tel.: + 1 212 305 0265,
E-mail: ea388@columbia.edu
Abbreviations: MCCE, multi-conformation continuum electrostatic;
SAS, solvent accessible surface.
(Received 15 September 2003, accepted 11 November 2003)
Eur. J. Biochem. 271, 173–185 (2004) ÓFEBS 2003 doi:10.1046/j.1432-1033.2003.03917.x
important is the modeling of the folded state, where the
errors of computing pK
a
s could be significantly larger
than 0.3 units. Over the years it has been a continuous
effort to develop methods for accurate pK
a
predictions
[20,21]. These include empirical methods [37], macroscopic
methods [38–41], finite difference Poisson–Boltzmann
(FDPB)-based methods [20–22,42], FDPB and molecular
dynamics [43–45], FDPB and molecular mechanics
[25,46,47] and Warshel’s microscopic methods (e.g.,
[16,17]). The predicted pK
a
s were benchmarked against
the experimental data and the average rmsd were found to
vary from the best value of 0.5pK [38], to 0.7pK [48], to
0.83pK [25] and to 0.89 [22]. Multi-Conformation Con-
tinuum Electrostatics (MCCE) [25] method was shown to
be among the best pK
a
s predictors and it will be
employed in this work.
In the present work we compute the pH dependence of
the free energy of folding and the net charge. The optimum
pH was identified as the pH at which the free energy of
folding has minimum. A large number of proteins having
different optimum pH [49] were studied to find the effect of
the amino acid composition and 3D structure on the
optimum pH.
Experimental procedures
Methods
Calculations were carried out using available 3D structures
of selected proteins. A text search was performed on
BRENDA database [49] in the field of pH of stability.Fol-
lowing search strings were used: maximal stability,maxi-
mum stability,optimal stability,optimum stability,best
stability,highest stabilityand greatest stability.This
revealed 168 proteins with experimentally determined pHs
of maximal stability. Then a search of the Protein Data Base
(PDB) was performed to find available structures for these
proteins. An attempt was made to select PDB structures of
proteins from the same species as those used in the
experiment (43 structures). Structures with missing residues
were omitted as well as the structures of proteins participa-
ting in large complexes resulting in the final set of 28 protein
structures. The protein names, the PDB file names and the
experimental pH of maximal stability are provided in
Table 1. The source of the data is BRENDA database and
thus the present study is limited to the proteins listed there.
There will always be proteins with experimentally determined
Table 1. Proteins and corresponding PDB [57] files used in the paper. The experimental optimum pH (pH of optimal stability) is taken from
BRENDA website [49]. The calculated optimum pH (the pH of the minimum of free energy of folding) is given in the forth column. The difference is
the calculated optimum pH minus the experimental number (fifth column). Bases/acid ratio for all ionizable groups is in sixth column, while the
seventh shows the bases/acids ratio for 66% buried groups. The last three columns show the averaged intrinsic pK shift, the averaged pK
a
shift and
the net charge of the folded protein at pH optimum, respectively.
Protein pdb code
Experimental
optimum
pH
Calculated
optimum
pH Difference
Base/acid
ratio
Buried
base/acid
ratio
Averaged
intrinsic
pK shift
Averaged
pK
a
shift
Net charge at
optimum pH
Dioxygenase 1b4u 8.0 8.0 0.0 0.94 1.33 0.08 )0.51 )3.0
Transferase 1f8x 6.5 5.0 )1.5 0.72 0.28 0.40 0.34 )5.5
Glutathione synthetase 1sga 8.0 7.5 )0.5 0.87 0.88 0.41 )0.58 )10.0
Isomerase 1b0z 6.0 6.0 0.0 1.02 0.90 0.05 )0.48 2.1
Coenzyme A 1bdo 6.5 7.0 0.5 0.67 1.50 0.22 0.03 )4.1
Dienelactone hydrolase 1din 7.0 6.5 )0.5 1.04 1.17 0.26 )0.36 )2.7
Dehydrogenase 1dpg 6.2 6.0 )0.2 0.79 1.05 0.38 )0.41 )13.0
Endothiapepsin 1gvx 4.15 4.0 )0.15 0.52 0.07 1.45 2.06 6.5
Dehydratase 1aw5 9.0 9.0 0.0 1.07 0.85 0.17 )0.48 )6.8
Cathepsin B 1huc 5.15 5.0 )0.15 0.90 0.73 1.28 0.11 5.8
Alginate lyase 1hv6 7.0 7.0 0.0 1.17 0.93 0.63 )0.72 2.7
Xylanase 1igo 5.5 6.5 1.0 1.41 1.00 0.60 )0.74 7.3
Hydrolase 1iun 7.5 7.0 )0.0 0.86 1.50 0.11 )1.15 )1.1
Aspartic protease 1j71 4.15 3.0 )1.15 0.54 0.33 0.98 1.32 9.4
Aldolase 1jcj 8.5 8.5 0.0 0.97 0.54 0.55 )0.19 )5.1
L
-Asparaginase 1jsl 8.5 7.0 )1.5 1.17 1.85 )0.12 )0.83 )0.1
Amylase 1lop 5.9 6.0 0.1 0.81 1.00 0.33 )0.42 )8.2
c-Glutamil hydrolase 1l9x 7.0 7.5 0.5 1.19 0.77 0.45 )0.02 2.8
Mutase 1m1b 7.0 6.0 )1.0 0.95 0.86 0.25 0.13 )3.2
Methapyrogatechase 1mpy 7.7 7.0 )0.7 1.0 1.33 0.11 )1.35 )12.0
Pyrovate oxidase 1pow 5.7 6.0 0.3 0.91 0.78 0.60 )0.51 )2.0
Chitosanase 1qgi 6.0 6.5 0.5 1.09 0.54 0.29 )0.31 5.0
Xylose isomerase 1qt1 8.0 8.0 0.0 0.84 1.50 0.24 )0.30 )16.0
Pyruvate decarboxylase 1zpd 6.0 7.0 1.0 1.02 0.83 0.47 )0.24 3.8
Acid a-amylase 2aaa 4.9 4.0 )0.90 0.51 0.64 1.53 1.48 )1.7
Formate dehydrogenase 2nac 5.6 7.0 1.40 1.11 1.42 0.06 )1.1 2.4
Phosphorylase 2tpt 6.0 5.0 )1.0 0.91 0.93 0.38 )0.34 )3.8
b-Amylase 5bca 5.5 5.0 )0.5 1.07 0.91 0.19 )0.13 15.1
174 E. Alexov (Eur. J. Biochem. 271)ÓFEBS 2003
optimum pH that were not in the database, and therefore are
not modeled in the paper. However, an additional four well
studied proteins were used to benchmark the method in
broad pH range and to compare the effect of mutations.
Free energy and net charge of unfolded state
The unfolded state is modeled as a chain of noninteracting
amino acids (the possibility of residual interactions in the
unfolded state is discussed at the end of the discussion
section). Thus, the free energy of ionizable groups (pH-
dependent free energy) is calculated as [31]:
DGunf ¼kT lnðZunfÞ
¼kT X
N
i1
lnf1þexp½2:3cðiÞðpHpKsol ðiÞÞg
ð1Þ
where kis the Boltzmann constant, Tis the temperature in
Kelvin degrees, Nis the number of ionizable groups, c(i)is1
for bases, )1foracids,pK
sol
(i) is the standard pK
a
value in
solution of group i(e.g., [47]), pH is the pH of the solution
and N is the number of ionizable residues. Z
unf
is the
partition function of unfolded state and DG
unf
is the free
energy of unfolded state. The reference state of zero free
energy is defined as state of all groups in their neutral forms
[31].
The net charge is calculated using the standard formula
that comes from Henderson–Hasselbalch equation:
qunf ¼X
N
i¼1
10cðiÞðpHpKsol ðiÞÞ
1þ10cðiÞðpHpKsol ðiÞÞ cðiÞð2Þ
where c(i)¼)1 or +1 in the case of acid or base,
respectively.
Free energy and net charge of the folded state
The pH-dependent free energy of the folded state is
calculated using the 3D structure of proteins listed in
Table 1. The 3D structure comprises Nionizable groups
(the same number as in the unfolded state) and Lpolar
groups. Each of them might have several alternative side-
chain rotamers [50], or alternative polar proton positions
[47]. In addition, ionizable groups are either ionized or
neutral. All these alternatives are called conformers,being
ionizational and positional conformers. There is no apriori
information to indicate which conformer is most likely to
exist at certain conditions of, for example, pH and salt
concentration. Each microstate is comprised of one con-
former per residue. The Monte Carlo method was used to
estimate the probability of microstates. This procedure
is called multi-conformation continuum electrostatics (MC
CE) and it is described in more details elsewhere [25,47,50]. A
brief summary of the MCCE method is provided in a later
section.
To find the free energy one should calculate the
partition function for each of the proteins. Thus, one
should construct all possible combinations of conformers.
Because of the very large number of conformers (most of
the cases more than 1000), the Monte Carlo method
(Metropolis algorithm [51]) is used to find the probability
of the microstates [20,47,50,52]. However, to construct the
partition function one should know all microstate energies
and to sum them up as exponents. Each microstate
energy should be taken only once, which induces extra
level of complexity. A special procedure is designed that
collects the lowest microstate energies and that assures
that each microstate is taken only once [50]. A microstate
was considered to be unique if its energy differs by more
than 0.001 kT from the energies of all previously
generated states. A much more stringent procedure that
compares the microstate composition would require
significant computation time and therefore was not
implemented. This results in a function that estimates
the partition function. This effective partition function
will not have the states with high energy (they are rejected
by the Metropolis algorithm), but they have negligible
effect [53]. In addition, the constructed partition function
may not have all low energy microstates, because given
microstate may not be generated in the Monte Carlo
sampling or because two or more distinctive microstates
may have identical or very similar energies. Bearing in
mind all these possibilities, the effective partition function
(Z
fol
)iscalculatedas[50]:
Zfol ¼X
Xfol
n¼1
expðDGfol
n=kTÞð3Þ
where DGfol
nis the energy of the microstate nand X
fol
is the
number of microstates collected in Monte Carlo procedure.
Then the free energy of ionizable and polar groups in folded
state is:
DGfol ¼kT lnðZfol Þð4Þ
The occupancy of each conformer (qfol
i) [52] is calculated
in the Metropolis algorithm and then used to calculate the
net charge of the folded state:
qfol ¼X
M
i¼1
qfol
icðiÞð5Þ
Mis the total number of conformers. [Note that c(i)¼0 for
non ionizable conformers.]
Free energy of folding
The pH-dependent free energy of folding is calculated as a
difference between the free energy of folded and unfolded
states:
DDGfolding ¼DGfol DGunf ð6Þ
An alternative formula of calculating the pH dependence
of the free energy of folding is [19,31,54,55]:
DDGfolding ¼2:3kT Z
pH2
pH1
DqdpH ð7Þ
where, pH
1
and pH
2
determine the pH interval and Dqis the
change of the net charge of the protein from unfolded to
folded state.
ÓFEBS 2003 Calculating pH of maximal protein stability (Eur. J. Biochem. 271) 175
Computational method: MCCE method
The basic principles of the method have been described
elsewhere [47,50]. The MCCE [25] method allows us to find
the equilibrated conformation and ionization states of
protein side chains, buried waters, ions, and ligands. The
method uses multiple preselected choices for atomic posi-
tions and ionization states for many selected side chains and
ligands. Then, electrostatic and nonelectrostatic energies
are calculated, providing look-up tables of conformer self-
energies and conformer–conformer pairwise interactions.
Protein microstates are then constructed by choosing one
conformer for each side chain and ligand. Monte Carlo
sampling then uses each microstate energy to find each
conformer’s probability.
Thus, the MCCE procedure is divided into three stages:
(a) selection of residues and generation of conformers; (b)
calculation of energies and (c) Monte Carlo sampling.
Selection of residues. The amino acids that are involved in
strong electrostatic interactions (magnitude > 3.5 kT) are
selected. They will be provided with extra side-chain
rotamers to reduce the effects of possible imperfections of
crystal structures. The reason is that a small change in their
position might cause a significant change in the pairwise
interactions [56]. The threshold of 3.5 kT is chosen based on
extensive modeling of structures and fitting to experiment-
ally determined quantities [25]. The selection is made by
calculating the electrostatic interactions using the ori-
ginal PDB [57] structure. The alternative side chains for
these selected residues are built using a standard library of
rotamers [58] and by adding an extra side chain position
using a procedure developed in the Honig’s laboratory [59].
The backbone is kept rigid. Then the original structure and
alternative side chains were provided with hydrogen atoms.
Polar protons of the side chains are assigned by satisfying all
hydrogen acceptors and avoiding all hydrogen donors [25].
Thus, every polar side chain and neutral forms of acids have
alternative polar proton positions.
Calculation of energies. The alternative side chains and
polar proton positions determine the conformational
space for a particular structure, and they are called
conformers. The next step is to compute the energies of
each conformer and to store them into look-up tables.
Because of conformation flexibility, the energy is no
longer only electrostatic in origin, but also has nonelec-
trostatic component [47,50].
Electrostatic energies are calculated by DelPhi [60,61],
using the PARSE [62] charge and radii set. Internal
dielectric constant is 4 [63], while the solution dielectric
constant is taken to be 80. The molecular surface is
generated with a water probe of radius 1.4 A
˚[64]. Ionic
strength is 0.15
M
and the linear Poisson–Boltzmann
equation is used. Focusing technique [65] was employed to
achieve a grid resolution of about two grids per A
˚ngstrom.
The Mcalculations, where Mis the number of conformers,
produce a vector of length Mfor reaction field energy
DG
rxn,i
and an MxM array of the pairwise interactions
between all possible conformers DGij
el . In addition, each
conformer has pairwise electrostatic interactions with the
backbone resulting in a vector of length MDG
pol,i
.The
magnitude of the strong pairwise and backbone interactions
is altered as described in [56]. Such a correction was
shown to improve significantly the accuracy of the calcu-
lated pK
a
s[25].
Having alternative side chains and polar hydrogen
positions requires nonelectrostatic energy to be taken into
account too. This energy is a constant in calculations that
use a rigidprotein structure (and therefore should not be
calculated), but in MCCE plays important role discrim-
inating alternative positional conformers. The non-
electrostatic interactions for each conformer are the
torsion energy, a self-energy term which is independent
of the position of all other residues in the protein, and
the pairwise Lennard–Jones interactions, both with por-
tions of the protein that are held rigid, and with
conformers of side chains that have different allowed posi-
tions [25,47,50].
Thus, the microstate npH-dependent free energy of
folded state is [20,21,47,50]:
DGfol
n¼X
M
i¼12:3kTdnðiÞ½cðiÞðpH pKsolðiÞÞ þ DpKint ÞðiÞ
þX
M
j¼iþ1
dnðiÞdnðjÞðGij
el þGij
nonel Þ;
DpKintðiÞ¼DpKsolv ðiÞþDpKdipðiÞþDpKnonel ðiÞ
ð8Þ
where d
n
(i)is1ifith conformer is present in the nth
microstate, Mis the total number of conformers, DpK
int
(i)
is the electrostatic and non electrostatic permanent energy
contribution to the energy of conformer i(note that it does
not contain interactions with polar groups), c(i)is1for
bases, )1 for acids, and 0 for neutral groups, DpK
solv
(i)isthe
change of solvation energy of group i,DpK
dip
(i)isthe
electrostatic interactions with permanent charges,
DpK
nonel
(i) is the nonelectrostatic energy with the rigid part
of protein, Gij
el and Gij
nonel are the pairwise electrostatic and
non electrostatic interactions, respectively, between con-
former iand j.
Monte Carlo sampling. TheMonteCarloalgorithmis
used to estimate the occupancy (the probability) of each
conformer at given pH. The convergence is considered
successful if the average fluctuation of the occupancy is
smaller than 0.01 [25]. The pH where the net charge of given
titratable group is 0.5 is pK
½
. To adopt a common
nomenclature, pK
½
will be referred as pK
a
throughout the
text.
Optimum pH, isoelectric point (pI) and bases/acids ratio
The experimental pH of maximal stability for each of the
proteins listed in Table 1 is taken from the website
BRENDA [49]. The database does not always provide a
single number for the optimum pH. If given protein is
reported to be stable in a range of pHs, then the optimum
pH is taken to be the middle of the pH range.
The optimum pH in the numerical calculation is deter-
mined as pH at which the free energy of folding has
minimum. In the case that the free energy of folding has a
176 E. Alexov (Eur. J. Biochem. 271)ÓFEBS 2003
minimum in a pH interval, the optimum pH is the middle of
the interval. The calculations were carried out in steps of
DpH ¼1. Thus, the computational resolution of determin-
ing the pH optimum was 0.5 pH units.
The calculated and experimental pH intervals were not
compared, because in many cases BRENDA database
provides only the pH of optimal stability. In addition, in
most cases the experimental pH interval of stability given in
the BRENDA database does not provide information for
the free energy change that the protein can tolerate and still
be stable. Therefore it cannot be compared with the
numerical results which provide only the pH dependence
of the folding free energy. Some proteins may tolerate a
free energy change of 10 kcalÆmol
)1
and still be stable, while
others became unstable upon a change of only a few
kcalÆmol
)1
.
The calculated isoelectric point (pI) is the pH at which
the net charge of folded state is equal to zero. There is
practically no experimental data for the pI of the proteins
listed in Table 1. The net charge at optimum pH is the
calculated net charge of the folded protein at pH
optimum. Base/acid ratio was calculated by counting all
Asp and Glu residues as acids and all Arg, Lys and His
residues as bases. In some cases, one or more acidic and/
or His residues was calculated to be neutral at a particular
pH optimum, but they were still counted. The reason for
this was to avoid the bias of the 3D structure and to
calculate the base/acid ratio purely from the sequence.
The given residue is counted as 66% buried if its
solvent accessible surface (SAS) is one-third of the SAS
in solution. Averaged intrinsic pK shifts were calculated
as
1
NX
N
i¼1
ðpKint ðiÞpKsol ðiÞÞ
and the averaged pK
a
sshiftas
1
NX
N
i¼1
ðpKaðiÞpKsolðiÞÞ
Thus, a negative pK shift corresponds to conditions such
that the protein stabilizes acids and destabilizes bases and
vice versa. Arginines were not included in the calculations
because their pK
a
s are calculated in many cases to be
outside the calculated pH range.
Results
Origin of optimum pH
The paper reports the pH dependence of the free energy of
folding. Despite the differences among the calculated
proteins, the results show that the pH-dependence profile
of the free energy of folding is approximately bell-shaped
and has a minimum at a certain pH, referred to through the
paper as the optimum pH.
To better understand the origin of the optimum pH, a
particular case will be considered in details. Figure 1A
shows the free energies of cathepsin B calculated in pH
range 0–14. Three energies were computed: the free energy
of the unfolded state (bottom line), the free energy of the
folded state (middle line) and the free energy of folding (top
curve). For the sake of convenience the free energies of the
folded state and folding are scaled by an additive constants
so to have the same magnitude as the free energy of the
unfolded state at the pH of the extreme value (in this case
pH ¼5). It improves the resolution of the graph without
changing its interpretation, because the energies contain an
undetermined constant (hydrophobic interactions, entropy
change, van der Waals interactions and other pH-inde-
pendent energies).
Free energy of unfolded state. It can be seen (Fig. 1A) that
the free energy of the unfolded state has a maximum value
at pH ¼5 and it rapidly decreases at low and high pHs.
Such a behavior can be easily understood given equation 1.
At low pH, the pK
sol
of all acidic groups is higher than the
current pH and thus they contribute negligible to the
partition function. In contrast, all basic groups contribute
significantly to the partition function. As the pH decreases,
their contribution increases, making the free energy more
negative. At medium pHs, all ionizable groups are ionized
(except His and Tyr), but their effect on the free energy is
quite small, because their pK
sol
areclosetothepH.This
results in a maximum of the free energy corresponding to
the least favorable state. At high pHs, the situation is
reversed: all acidic groups have a major contribution to the
partition function, while bases add very little. Thus, the free
energy profile of the unfolded state is always a smooth curve
(bell-shaped) with a maximum at a certain pH. The shape of
the curve and the position of the maximum depend entirely
upon the amino acid composition.
Fig. 1. Cathepsin B pH-dependent properties.
(A) Free energy; (B) net charge.
ÓFEBS 2003 Calculating pH of maximal protein stability (Eur. J. Biochem. 271) 177