What determines the degree of compactness of a calcium-binding protein? Liliane Mouawad1, Adriana Isvoran2, Eric Quiniou1 and Constantin T. Craescu1
1 Inserm U759 ⁄ Institut Curie-Recherche, Centre Universitaire Paris-Sud, Orsay, France 2 Department of Chemistry, West University of Timisoara, Romania
Keywords calcium-binding proteins; centrin; EF-hand; hydrophobicity; predicted form
Correspondence L. Mouawad, Inserm U759 ⁄ Institut Curie- Recherche, Centre Universitaire Paris-Sud, Baˆ timent 112, 91405 Orsay Cedex, France Fax: +33 1 69 07 53 27 Tel: +33 1 69 86 71 51 E-mail: liliane.mouawad@curie.u-psud.fr
(Received 8 September 2008, revised 8 December 2008, accepted 10 December 2008)
doi:10.1111/j.1742-4658.2008.06851.x
The EF-hand calcium-binding proteins may exist either in an extended or a compact conformation. This conformation is sometimes correlated with the function of the calcium-binding protein. For those proteins whose structure and function are known, calcium sensors are usually extended and calcium buffers compact; hence, there is interest in predicting the form of the pro- tein starting from its sequence. In the present study, we used two different procedures: one that already exists in the literature, the sosuidumbbell algorithm, mainly based on the charges of the two EF-hand domains, and the other comprising a novel procedure that is based on linker average hydrophilicity. The linker consists of the residues that connect the domains. The two procedures were tested on 17 known-structure calcium-binding proteins and then applied to 59 unknown-structure centrins. The sosui- dumbbell algorithm yielded the correct conformations for only 15 of the known-structure proteins and predicted that all centrins should be in a closed form. The linker average hydrophilicity procedure discriminated well between all the extended and non-extended forms of the known-structure calcium-binding proteins, and its prediction concerning centrins reflected well their phylogenetic classification. The linker average hydrophilicity cri- terion is a simple and powerful means to discriminate between extended and non-extended forms of calcium-binding proteins. What is remarkable is that only a few residues that constitute the linker (between 2 and 20 in our tested sample of proteins) are responsible for the form of the calcium- binding protein, showing that this form is mainly governed by short-range interactions.
bind calcium; they may be found in the cytoplasm (similar to C2 domain proteins) [3], in the extracellular medium [4] or associated with the membrane (similar to annexins) [5].
Calcium transport and ⁄ or regulation are important events for the normal morphology and metabolism of the cell and play significant roles in the mechanisms of many disease processes [1]. The proteins that interact with the calcium ions involved in these events are called calcium-binding proteins (CaBPs). They form two main subfamilies: the EF-hand CaBPs and the non-EF-hand CaBPs. EF-hand CaBPs, whose proto- type is calmodulin [2], are characterized by the pres- ence of structural motifs called ‘EF-hands’. Non EF-hand CaBPs do not use this structural motif to
For the EF-hand CaBPs, each EF-hand motif con- tains two helices connected by the calcium-binding loop, a highly conserved region that binds the metal ion. Many CaBPs exhibit two domains, each contain- ing two EF-hand motifs; the N-terminal (helices A, B, C and D) and C-terminal (helices E, F, G and H) domains are connected by a linker region (Fig. 1).
Abbreviations CaBP, calcium-binding protein; LAH, linker average hydrophilicity; PDB, Protein Data Bank.
FEBS Journal 276 (2009) 1082–1093 Journal compilation ª 2009 FEBS. No claim to original French government works
1082
L. Mouawad et al.
Compactness of calcium-binding proteins
N-domain
C-domain
form of the CaBPs from their sequences, and therefore indicate their biological function.
Loop I
Loop II
Loop III Loop IV
A B
C D
E F G H
Linker
Fig. 1. The EF-hand protein schematic representation. Each EF- hand motif consists of two helices linked by a calcium loop (black dots represent calcium ions). Two motifs constitute one EF-hand domain. The N- and C-domains are bound by a linker (bold line).
Recently, a protein classification tool, sosuidumb- bell [11], was developed to predict the degree of com- pactness of proteins starting from their amino acid sequences. This tool is based on studies undertaken on all the monomers of the Protein Data Bank (PDB) [12], and not just CaBPs, indicating that the electro- static repulsion between the domains is a dominant factor in the stabilization of the extended structures, in addition to the amphiphilic character of the central flexible region. By contrast, globular proteins are pre- dicted to be stabilized by a hydrophobic core built by residues from the two domains. Using the sosuidumb- bell algorithm, we have analyzed 17 CaBPs with known 3D structures (Table 1). Fifteen of them were predicted in the correct form but, unfortunately, two structures were incorrectly predicted. Indeed, human calmodulin-like protein (1GGZ) [13] and human cen- trin 2 (2GGM) [14], which are extended proteins, were predicted to be compact. These exceptions represent a non-negligible percentage (12%) and they emphasize the need for a more detailed analysis of the sequence– structure relationship in the case of CaBPs.
centers
EF-hand CaBPs are divided into two broad classes [6]: those that bind calcium to regulate its concentration (calcium-buffering and calcium-transporting proteins) and those that bind calcium to decode its signal (cal- cium-sensor proteins). The two functional classes also features: calcium-buffering have different structural and calcium-transporting proteins, such as parvalbu- min [7] or the Nereis diversicolor sarcoplasmic calcium- binding protein [8], usually have a compact tertiary structure and are not conformationally sensitive to cal- cium-binding, whereas calcium sensor proteins, such as calmodulin [2] and troponin C [9], have extended ter- tiary structures and show important conformational changes upon calcium-binding. In the extended form, the linker between the two domains may be structured in a straight helix, whereas, in the non-extended form, the linker is unstructured leading to either a floppy conformation or a very compact one (Fig. 2) [10]. It is important to understand the physical reasons for these differences. This would provide tools to predict the
A
B
(c)
Fig. 2. View of the 3D structures of two CaBPs: (A) the extended form of calmodulin (PDB code: 1CLL) and (B) the non-extended form of guanylate cyclase activating protein 2 (PDB code: 1JBA). The helices are in cyan, the b-sheets are in yellow and the linker is in red. The linker in 1CLL is structured, whereas it is a loop in 1JBA. The view was drawn using VMD software [10].
In the present study, we have developed a novel pro- cedure based on the linker average hydrophilicity (LAH), which we applied to our sample of 17 known- structure CaBPs and to unknown structures of cent- rins. Centrins, a subfamily of CaBPs, are essential in components of microtubule-organizing organisms ranging from algae and yeast to humans [15,16]. They are EF-hand calcium-binding proteins with a sequence similarity to calmodulin but distinct calcium-binding properties [15]. They were shown to be involved in centrosome duplication [17] and the contraction of centrin-based fiber systems [18] and to play a functional role in nuclear export pathways [19]. The Ca2+ dependence of the centrin interactions with their targets suggests that centrins play a regulatory role by activating or changing the conformation of various target proteins. Analyses of amino acid sequences of centrins from different organisms reveal at least four phylogenetic families and several phyloge- netic subfamilies [20,21]. The centrins that we consider in the present study are listed in Table 2: (a) the Chla- mydomonas reinhardtii-like family (CrCen-like), which contains centrins from the subfamilies of green algae and vertebrate isoforms Cen1 and Cen2; (b) the higher plants Arabidopsis-like family (AtCen-like); the yeast Saccharomyces cerevisiae-like family (Cdc31-like), which contains mainly two subfamilies, fungal centrins and the vertebrate isoform Cen3; and (d) the Parame- cium tetraurelia infraciliary lattice family (PtICL1-like),
FEBS Journal 276 (2009) 1082–1093 Journal compilation ª 2009 FEBS. No claim to original French government works
1083
L. Mouawad et al.
Compactness of calcium-binding proteins
Table 1. Features of the known-structure CaBPs used in the present study, showing the name of the protein, its code in the PDB, its code in the SwissProt data bank, its form as determined experimentally and its form as predicted by the SOSUIDUMBBELL algorithm (http:// bp.nuap.nagoya-u.ac.jp/sosui/sosuidumbbell/dumbbell_submit.html). CIB, calcium-and-integrin-binding protein; SCBP, sarcoplasmic calcium- binding protein.
Protein
PDB code
SwissProt code
Experimental structure
Structure predicted by the SOSUIDUMBBELL algorithm
Chicken troponin C Rabbit troponin C Human calmodulin Paramecium calmodulin Potato calmodulin Human calmodulin-like protein Human centrin 2 Yeast centrin Yeast myosin light chain Calcineurin B homologous protein 1 Bovine recoverin Guanylate cyclase activating protein 2 Bovine neurocalcin d Amphioxus SCBP Sandworm SCBP Bacterial calerythrin Human CIB
4TNC 1TN4 1CLL 1OSA 1RFJ 1GGZ 2GGM 2DOQ 1GGWa 2CT9 1REC 1JBAa 1BJF 2SAS 2SCP 1NYAa 1DGUa
P02588 P02586 P62158 P07463 Q42478 P27482 P41208 P06704 Q09196 P61023 P21457 P51177 P61602 P04570 P04571 P06495 Q99828
Extended Extended Extended Extended Extended Extended Extended Non-extended Non-extended Non-extended Non-extended Non-extended Non-extended Non-extended Non-extended Non-extended Non-extended
Extended Extended Extended Extended Extended Non-extended Non-extended Non-extended Non-extended Non-extended Non-extended Non-extended Non-extended Non-extended Non-extended Non-extended Non-extended
a Structure determined by NMR.
organized in ten subfamilies that contain 35 identified isoforms [22]. The 3D structure of the entire protein in complex with its target polypeptide is known for only two centrins: the human centrin: HsCen2 (2GGM) [14] and the Saccaromyces cerevisiae centrin, ScCdc31 (2DOQ) [23].
The functional diversity of centrins should depend on their sequence and their Ca2+ binding properties. However, we may ask whether the global conforma- tion or the conformational preference of individual centrin molecules also play a role in the target recogni- tion and the plasticity of heteromolecular complexes. This idea is supported by the recent observation that yeast ScCdc31 bound to a ScSfi1 fragment shows a bent conformation [23], whereas human HsCen2 in complex with an XPC peptide is completely extended [14]. In the present study, we present a new and simple theoretical procedure for the global shape prediction of EF-hand proteins that allows us to analyze the pos- sible shape diversity of centrins presented in Table 2.
Results and Discussion
Utilization of the SOSUIDUMBBELL algorithm
We first applied the sosuidumbbell algorithm (http:// bp.nuap.nagoya-u.ac.jp/sosui/sosuidumbbell/dumbbell_ submit.html) to all the CaBPs with known 3D struc-
tures (Table 1). In this algorithm, a structure is pre- dicted to be extended if it obeys four criteria: (a) the absolute value of the net charge of the entire protein is higher than 20 (|Qprot| > 20); (b) the absolute net charge density (|Qprot| ⁄ N, where N is the total number of residues) is higher than 0.14 (dQ > 0.14); (c) there is a charge balance between the two domains (|QNQC| > 100); and (d) there is a high amphiphilicity at the center of the linker region and a high hydropa- thy at its termini [11]. Based on these four criteria, the results yielded 15 well-predicted structures and two incorrectly predicted ones. The latter are human cal- modulin-like protein (1GGZ) and human centrin 2 (2GGM), the structures of which are extended but pre- dicted as non-extended. Therefore the question remained as to which of the four criteria described above is responsible for this misprediction. To address this question, we verified initially the first two criteria. For this purpose, we calculated the absolute net charge and the charge density of the entire protein for all the investigated CaBPs (Table 3), with known and unknown structures (Tables 1 and 2). First, we fol- lowed exactly the procedure described by Uchikoga et al. [11], namely that histidine residues were consid- ered as positively charged (although at the pH values corresponding to the great majority of the experiments, they are deprotonated) and the calcium ions that might results bind to the protein were omitted. The
FEBS Journal 276 (2009) 1082–1093 Journal compilation ª 2009 FEBS. No claim to original French government works
1084
L. Mouawad et al.
Compactness of calcium-binding proteins
Table 2. Phylogenetic classification of centrins. All centrins considered in the present study (with known and unknown structures) are classi- fied by families and subfamilies. The PDB codes of the known structures of fragments (*) or the entire protein are given.
Phylogenetic family
Subfamily
Protein name
Abbreviation
SwissProt code ⁄ PDB code
CrCen
Cen1
Cen2
Human centrin 1 Mouse centrin 1 Bovine centrin 1 Human centrin 2
HsCen1 MmCen1 BtCetn1 HsCen2
Algae centrins
AtCen
Higher plant centrins
Cdc31
Cen3
Mouse centrin 2 Pig centrin 2 Xenopus laevis centrin 2 Xenopus tropicalis centrin 2 Dunaliella salina centrin Chlamydomonas reinhardtii centrin Tetraselmis striata centrin Scherffelia dubia centrin Micromonas pusilla centrin Marsilea vestita centrin Spermatozopsis similis centrin Pterosperma cristatum centrin Arabidopsis thaliana centrin Nicotiana tabacum centrin 1 Atriplex nummularia centrin Human centrin 3 Rat centrin 3 Mouse centrin 3 Xenopus laevis centrin 3 Euplotes octocarinatus centrin Yeast centrin Xenopus tropicalis centrin 3
PtICLs
ICL1a
ICL1e
ICL3a
ICL3b
ICL5
ICL7
ICL8
MmCen2 SsCen2 XlCen2 XtCen2 DsCen CrCen TsCen SdCen MpCen MvCen SsCen PcCen AtCen NtCen AnCen HsCen3 RnCen3 MmCen3 XlCen3 EoCen ScCdc31 XtCen3 PtICL1a PtICL1b PtICL1c PtICL1d PtICL1f PtICL1e PtICL1g PtCen8 PtCen10 PtCen12 PtCen15 PtCen18 PtICL3a PtICL3c PtICL3d PtICL3e PtICL3f PtICL3b PtICL3g PtICL5a PtICL5b PtICL6a PtICL6b PtICL7a PtICL7b PtICL8a PtICL8b PtICL9a
Q12798 P41209 Q32LE3 P41208 ⁄ 2GGM ⁄ 2OBH ⁄ 1M39* ⁄ 1ZMZ* ⁄ 2A4J* Q9R1K9 Q4U4N2 Q7SYA4 Q28HC5 P54213 P05434 ⁄ 1OQP* ⁄ 2AMI* P43646 Q06827 Q40303 O49999 P43645 Q40791 O82659 Q9SQI5 P41210 O15182 Q91ZZ8 O35648 Q9DEZ4 Q9XZV2 ⁄ 2JOJ* P06704 ⁄ 2DOQ ⁄ 2GV5 Q28GW2 Q27177 Q 27179 Q 27178 Q 94726 Q3SEK2 Q3SEK0 Q3SEJ9 Q3SEJ6 Q3SEJ7 Q6BFB6 Q3SEJ0 A0CTY5 Q3SDB8 Q3SDA6 Q3SEI1 Q3SEI3 Q3SEI4 Q3SEI0 A0BUT1 Q3SEH8 Q3SEH7 Q3SEH9 Q3SCX3 A0DZH6 A0DZH A0BTY0 A0C3G3 Q3SEI2
ICL9
FEBS Journal 276 (2009) 1082–1093 Journal compilation ª 2009 FEBS. No claim to original French government works
1085
L. Mouawad et al.
Compactness of calcium-binding proteins
Table 2. Continued.
Phylogenetic family
Subfamily
Protein name
Abbreviation
SwissProt code ⁄ PDB code
ICL10
ICL11
PtICL9b PtICL9c PtICL9d PtICL10a PtICL10b PtICL11a PtICL11b
A0BE66 A0D3D5 A0D6A4 A0DZD2 A0BJD5 A0BI27 A0BQH1
between two and 20 residues long (Table 3), corre- sponding to 0.96% and 10.26%, respectively, of the protein sequence length.
troponins, all
Based on this definition of the linker, the charges of the N- and C-domains were calculated without consid- ering the calcium ions. In Fig. 3C, we report the abso- |QNQC|, lute value of the product of these charges, the charge balance between the which represents domains. With the exception of the investigated proteins are characterized by products |QNQC| lower than 100, and therefore are predicted to be non-extended.
the entire protein or of
(Fig. 3A,B and Table 3) show that, as indicated above, only five known-structure proteins are predicted to be extended instead of the seven expected (1GGZ and 2GGM are mispredicted) and all centrins with unknown structures are predicted in a non-extended form. In a second step, the histidines were considered neutral (CaBPs usually contain very little His) and the Ca+2 ions were added, but the results were even worse (data not shown) because the net charge was dimin- ished and therefore the structures were predicted to be even more compact. The first two criteria appear to be responsible for the misprediction of the form of 1GGZ and 2GGM. Moreover, concerning centrins with unknown structures, some experimental results (C. T. Craescu & S. Miron, unpublished data) in addition to the phylogenetic classification indicate that at least the CrCen family proteins should be in an extended form, which is not the case in the prediction based on the first two criteria.
it
The fourth criterion of
From these results, it is clear that, for CaBPs, the charges of the separated domains are not responsible for the extended or com- pact form of the protein. This assertion is obvious in the case of human centrin 2 (HsCen2). In this protein, the first 25 amino acids, corresponding to a disordered region, are highly charged [24,25], with the net charge of this peptide being equal to 6 (it contains seven basic and one acidic residues). The X-ray structure of this protein was obtained in the presence [14] and in the absence [25] of these residues (PDB codes 2GGM and 2OBH, respectively). In both cases, HsCen2 adopts an extended conformation, showing that the charge bal- ance of the domains does not play an important role for this protein. Nevertheless, in both cases, the sosui- dumbbell algorithm predicts a non-extended form, which is not correct. Moreover, the structure of all the extended forms of the CaBPs considered in the present study was determined experimentally in the presence of calcium ions. Knowing that these ions reduce signifi- cantly the charges of the domains and therefore their electrostatic repulsions, calcium-binding should favor the compact structure of CaBPs, which is not the case. the sosuidumbbell tool refers to the hydrophobicity of linker the central region, which is calculated using the Kyte & Doolittle Scale [26]. Ushikoga et al. [11] described the linker region of an extended protein as having an important negative hydrophobicity in its center (i.e. to be signifi- cantly hydrophilic), whereas its edges (helices D and
The last two criteria in the sosuidumbbell algo- rithm are strongly dependent on the definition of the domains and the inter-domain linker. The delimita- in the this linker is not always obvious: tion of extended structures, it forms a helix in the continuity of helices D and E, whereas, in some compact con- formations, is a very short unstructured region (Fig. 2). In the sosuidumbbell algorithm, the linker considered may be too long and, consequently, the domains too short, as for calmodulin, where helices D and E, which belong to the N- and C-domains, respectively, are considered as parts of the linker [11]. In the present study, to determine the linker, we identified first the calcium-binding loops (Fig. 1), then we counted ten residues after loop II (corre- sponding to helix D) and ten residues before loop III (corresponding to helix E), and the remaining resi- dues inbetween were considered as the inter-domain linker. Ten residues were considered for helices D and E because the experimental structural data show that a helix belonging to an EF-hand motif contains ten residues on average. Consequently, in the pro- teins investigated in the present study, the linker was
FEBS Journal 276 (2009) 1082–1093 Journal compilation ª 2009 FEBS. No claim to original French government works
1086
L. Mouawad et al.
Compactness of calcium-binding proteins
(cid:3)
f o
s a
l
r e k n
r e k n
i l
i l
(cid:2)
, )
t e n
Q d (
t n e c r e P
l
,
i
) n (
y t i s n e d
l
i
r e k n L
l
i
)
i
s g F
N
(
l
e u a v 0 0 1 (cid:2) nN e h t e g r a h c e h t 6 9 . 1 8 9 . 1 6 9 . 4 1 1 . 3 8 3 . 3 1 9 . 2 6 3 . 3 6 9 . 2 6 9 . 2 6 3 . 3 1 9 . 2 1 9 . 2 0 6 . 3 6 3 . 3 1 9 . 2 9 9 . 2 1 9 . 2 6 3 . 3 1 9 . 2 2 8 . 2 1 9 . 2 0 0 . 5 6 9 . 2 2 6 . 2 6 7 . 3 4 3 . 7 1 9 . 4 8 3 . 3 0 6 . 4 4 9 . 2 8 3 . 3 3 0 . 7 8 9 . 2 7 0 . 2 o t h t g n e 6 2 . 0 1 e h t 4 d n a g n g n o e b 4 4 7 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 8 5 5 5 8 5 8 5 5 5 4 3 h t g n e 0 2 3 1 3 1 e g r a h c
i
s e u d s e r
i
. o n
l
. h t g n e
l
, ) 6
s e u d s e r
s a
+
i
a t o T
f o
i
l
, ) | C Q N Q
[ 0 1
| (
–
i
s a u q e
K
i
f o
, 0 1
s n a m o d
+
n e t u o s b a e h t d e s u e h t 4 0 2 2 0 2 5 9 1 1 4 1 1 6 1 8 4 1 2 7 1 9 4 1 9 6 1 9 6 1 9 4 1 2 7 1 2 7 1 9 3 1 9 4 1 2 7 1 7 6 1 2 7 1 9 4 1 2 7 1 7 7 1 2 7 1 0 6 1 9 6 1 1 9 1 3 3 1 7 7 1 3 6 1 8 4 1 4 7 1 0 7 1 8 4 1 5 8 1 8 6 1 3 9 1 n n e t o r p n e t o r p e h t e h t o t n o g e r
i
r e k n L
J ]
t c e p s e r
r e b m u n
f o
i
l
. o N
s e u d s e r
w h t g n e
l
, e m a n
f o
i
h t g n e l ( 4 9 – 1 9 9 9 – 6 9 2 1 1 – 3 9 6 7 – 0 7 5 9 – 1 9 9 7 – 5 7 3 0 1 – 9 9 3 8 – 9 7 0 0 1 – 6 9 0 0 1 – 6 9 3 8 – 9 7 3 0 1 – 9 9 3 0 1 – 9 9 0 7 – 6 6 3 8 – 9 7 3 0 1 – 9 9 7 9 – 3 9 3 0 1 – 9 9 3 8 – 9 7 3 0 1 – 9 9 3 0 1 – 9 9 3 9 – 6 8 8 9 – 4 9 7 9 – 3 9 1 7 – 7 6 2 0 1 – 0 9 6 9 – 9 8 9 7 – 5 7 3 9 – 6 8 1 0 1 – 7 9 9 7 – 5 7 4 0 1 – 2 9 9 9 – 5 9 8 9 – 5 9 7 0 1 – 3 0 1 e h t h t i e h t e h P e c n e u q e s 0 0 1 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 2 0 2 0 0 0 0 0 0 0 0 e c n a a b e h t
r e k n
i
i l
i
f o
. o N
s e u d s e r
l
f o
y G
i
, ) t o r p Q
(
p r T e g r a h c e d s n e t o r p 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 e h t e h t
. o N
s e u d s e r
i
i
i
, y
s e u d s e r
i
l l
. o N
s e u d s e r
a n fi
r o
, d n a
l
s u p
f o
) n (
H A L
i
B D P
N
l
i l
j t o r p Q
j
) k n Q
(
r e k n
r e k n
i l
i l
¼ Q d
i
r e k n
i l
|
, )
i
N
(
C Q N Q
f o
|
, s P B a C
i
h c a e m o r f 1 1 1 2 0 1 0 0 1 1 0 0 0 0 0 0 1 0 0 0 1 0 1 1 0 1 2 1 1 1 1 1 0 1 1 e g a t n e c r e p e h t n e t o r p o r P n o i t a v e r b b a 0 0 3 3 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 2 0 0 1 0 0 e h t e r i t n e e e r h t e h t 7 1 3 . 0 7 7 4 . 0 5 6 9 . 0 6 1 5 . 0 4 7 1 . 1 0 6 7 . 1 3 1 7 . 1 1 1 8 . 1 9 4 7 . 1 7 6 7 . 1 2 2 8 . 1 1 3 7 . 1 1 3 7 . 1 3 1 7 . 1 2 8 7 . 1 3 1 7 . 1 9 4 6 . 1 2 4 6 . 1 2 2 8 . 1 2 4 6 . 1 8 9 6 . 1 3 1 7 . 1 2 6 6 . 1 9 6 6 . 1 0 6 7 . 1 3 0 0 . 0 2 6 6 . 1 0 6 7 . 1 2 3 5 . 0 2 6 7 . 1 0 6 7 . 1 0 6 7 . 1 6 1 0 . 1 e d o c 4 0 4 . 0 ) 2 0 2 . 0 ) d n a n o g e r h t g n e e h t 9 4 0 . 0 5 1 0 . 0 1 3 0 . 0 1 7 0 . 0 9 9 0 . 0 5 1 1 . 0 6 4 0 . 0 1 2 1 . 0 1 7 0 . 0 3 5 0 . 0 7 4 1 . 0 6 4 0 . 0 6 4 0 . 0 9 7 0 . 0 7 4 1 . 0 6 4 0 . 0 8 4 0 . 0 2 5 0 . 0 4 5 1 . 0 8 5 0 . 0 8 6 0 . 0 8 5 0 . 0 2 5 0 . 0 5 7 1 . 0 1 4 0 . 0 8 2 1 . 0 3 7 0 . 0 2 7 1 . 0 1 2 1 . 0 5 7 0 . 0 5 6 0 . 0 8 0 1 . 0 4 5 0 . 0 5 6 0 . 0 5 1 0 . 0 e h t e h t n g n w o h s e h t 0 4 0 9 0 9 9 9 7 0 0 2 5 1 0 7 0 7 6 5 1 1 0 9 8 1 0 9 0 1 1 1 5 3 1 1 0 6 5 3 1 8 0 2 0 1 0 6 6 1 0 1 0 0 1 2 8 1 2 8 1 e h P
t o r p Q
s e u d s e r
f o
t e n
i l
l
, y G
k n Q
l
l
r e b m u n
c
, o r P
s n o i t a u c a c
, ) C Q
l
Q
f o
a t o t
r u o
, N Q
(
l l
a
N Q
i
f o
r e b m u n
d n a 3 ) 6 ) 8 ) 9 ) 8 ) 8 ) 8 ) 8 ) 9 ) 7 ) 3 ) e g r a h c 0 1 ) 0 1 ) 6 1 ) 7 1 ) 8 1 ) 2 1 ) 2 2 ) 1 1 ) 2 2 ) 3 2 ) 0 1 ) 2 1 ) 0 1 ) 0 1 ) 8 2 ) 7 1 ) 3 1 ) 8 2 ) 8 1 ) 3 1 ) 1 1 ) 6 1 ) 0 1 ) 1 1 ) n o p r T 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 ) 2 ) 2 ) 3 ) 3 ) 3 ) 3 ) 1 ) 1 ) 1 ) 1 ) 1 ) e h t 0 2 ) 4 ) 5 ) 9 ) 7 ) 9 ) 9 ) 9 ) 9 ) 9 ) 9 ) 9 ) 7 ) 7 ) 7 ) 9 ) 7 ) 9 ) 2 ) 1 ) 0 1 ) 0 1 ) 1 1 ) 0 1 ) 1 1 ) 0 1 ) 1 1 ) 3 1 ) 2 1 ) 3 1 ) 0 1 ) 0 1 ) 0 1 ) 0 1 ) 0 1 0 1 1 1 1 1 1 0 0 9 ) 2 ) 3 ) 7 ) 7 ) 1 ) 8 ) 2 ) 1 ) 5 ) 5 ) 5 ) 9 ) 1 ) 2 ) 6 ) 8 ) 1 ) 2 ) 0 1 ) 0 1 ) 0 1 ) 4 1 ) 4 1 ) e h t
s e r u t c u r t s
, t x e t
. o N
s n a m o d - C
,
s e r u t c u r t s
s e r u t c u r t s
s t l u s e R
i
H A L
. 3
-
i
l
A B J 1
C E R 1
W G G 1
Q O D 2
M G G 2
J F R 1
A S O 1
U G D 1
A Y N 1
C N T 4
P C S 2
S A S 2
f o
N e h t
FEBS Journal 276 (2009) 1082–1093 Journal compilation ª 2009 FEBS. No claim to original French government works
1087
9 8 7 6 5 4 3 2 1 2 1 1 1 0 1 8 2 7 2 6 2 5 2 4 2 3 2 1 2 2 2 9 1 0 2 5 3 8 1 4 3 3 3 7 1 2 3 6 1 1 3 0 3 5 1 9 2 4 1 3 1 e h t e h t n d n a e h t 9 T C 2 n e C d S n e C s T n e C r C Z G G 1 n e C s D 2 n e C t X 2 n e C X 2 n e C s S 2 n e C m M 1 n e C t B n e C n A 1 n e C m M 1 n e C s H L L C 1 n e C t N n e C t A 4 N T 1 n e C c P n e C s S n e C v M n e C p M F J B 1 d e d n e t x e - n o N n w o n k n U d e d n e t x E d e n fi e d n e t o r P e l b a T
L. Mouawad et al.
Compactness of calcium-binding proteins
(cid:3)
r e k n
i l
(cid:2)
0 0 1 (cid:2) nN
t n e c r e P
l
) n (
i
h t g n e 0 3 . 3 5 4 . 3 3 7 . 2 6 2 . 3 6 7 . 2 0 3 . 3 3 7 . 2 0 3 . 3 5 7 . 2 0 6 . 2 4 4 . 2 6 7 . 2 9 5 . 2 4 4 . 2 7 6 . 1 9 9 . 2 4 5 . 2 6 9 . 0 7 6 . 1 8 9 . 2 3 6 . 2 7 9 . 0 9 9 . 2 0 6 . 2 6 9 . 0 9 9 . 2 0 6 . 2 6 9 . 0 0 6 . 2 4 1 . 3 5 4 . 3 9 9 . 2 1 4 . 3 6 2 . 3 5 4 . 3 6 2 . 3 5 4 . 3 2 7 . 2 1 4 . 3 6 2 . 3 2 7 . 2
r e k n L
l
)
N
(
i
. o n
l
s e u d s e r
h t g n e 6 6 5 6 5 6 5 6 5 5 5 5 5 5 4 5 5 2 4 5 5 2 5 5 2 5 5 5 2 5 6 5 6 6 6 6 6 5 6 6 5
a t o T
f o
] 0 1
–
i
K
2 8 1 4 7 1 3 8 1 4 8 1 1 8 1 2 8 1 3 8 1 2 8 1 2 8 1 2 9 1 5 0 2 1 8 1 3 9 1 5 0 2 0 4 2 7 6 1 7 9 1 8 0 2 0 4 2 8 6 1 0 9 1 6 0 2 7 6 1 2 9 1 8 0 2 7 6 1 2 9 1 2 9 1 8 0 2 9 5 1 4 7 1 7 6 1 6 7 1 4 8 1 4 8 1 4 7 1 4 7 1 4 8 1 6 7 1 4 8 1 4 8 1
, 0 1
+
n o g e r
i
r e k n L
J ]
0 0 1 – 6 9 9 9 – 5 9 0 0 1 – 6 9 0 0 1 – 6 9 2 9 – 8 8 0 0 1 – 6 9 4 0 1 – 9 9 4 0 1 – 9 9 5 1 1 – 0 1 1 7 0 1 – 2 0 1 4 1 1 – 0 1 1 1 1 1 – 6 0 1 2 1 1 – 8 0 1 9 0 1 – 4 0 1 4 1 1 – 0 1 1 9 0 1 – 4 0 1 3 1 1 – 9 0 1 2 2 1 – 8 1 1 3 3 1 – 9 2 1 2 1 1 – 8 0 1 3 2 1 – 9 1 1 7 6 1 – 4 6 1 3 3 1 – 9 2 1 6 2 1 – 2 2 1 3 3 1 – 2 3 1 7 6 1 – 4 6 1 9 1 1 – 5 1 1 1 3 1 – 0 3 1 1 2 1 – 7 1 1 3 3 1 – 2 3 1 1 2 1 – 7 1 1 1 2 1 – 7 1 1 3 3 1 – 2 3 1 7 0 1 – 2 0 1 9 0 1 – 4 0 1 7 0 1 – 2 0 1 7 0 1 – 2 0 1 4 0 1 – 0 0 1 9 0 1 – 4 0 1 1 1 1 – 6 0 1 4 0 1 – 0 0 1
i
. o N
s e u d s e r
e h P 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 1 0 0 1 0 0 0 0 0 1 0 1 0 1 1 0 0
i
. o N
s e u d s e r
l
y G
i
p r T 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0
. o N
s e u d s e r
2 2 0 2 0 2 0 2 0 1 1 0 1 1 0 0 0 0 2 0 0 2 0 0 2 0 0 0 2 0 2 0 2 2 2 2 2 2 2 2 2
i
. o N
s e u d s e r
o r P 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 1 0 0 0 1 0 0 1 0 0 1 0 0 0 1 0 1 0 1 0 1 0 0 0 0 0
H A L
N
j t o r p Q
j
6 9 1 . 1 6 9 1 . 1 6 7 4 . 1 5 6 8 . 0 6 7 4 . 1 8 1 9 . 0 6 7 4 . 1 8 1 9 . 0 6 7 4 . 1 0 2 8 . 0 0 8 2 . 1 6 7 4 . 1 0 2 8 . 0 8 7 8 . 1 8 1 2 . 1 6 7 0 . 1 2 4 5 . 1 6 0 2 . 0 8 7 8 . 1 5 1 2 . 1 2 4 5 . 1 6 0 2 . 0 6 7 0 . 1 2 4 5 . 1 6 0 2 . 0 7 2 9 . 0 2 4 5 . 1 2 2 5 . 1 6 0 2 . 0 7 2 9 . 0 6 9 1 . 1 7 2 9 . 0 2 3 3 . 0 5 8 1 . 1 6 9 1 . 1 9 0 4 . 0 7 2 5 . 0 6 9 1 . 1 7 0 1 . 1 5 6 8 . 0 7 2 5 . 0
¼ Q d
|
4 5 0 . 0 7 5 0 . 0 1 7 0 . 0 2 3 0 . 0 2 7 0 . 0 8 3 0 . 0 1 7 0 . 0 8 3 0 . 0 1 7 0 . 0 7 4 0 . 0 4 2 0 . 0 2 7 0 . 0 6 4 0 . 0 4 0 0 . 0 4 2 0 . 0 0 4 8 0 . 0 0 5 0 . 0 7 7 0 . 0 1 7 0 . 0 7 5 0 . 0 8 7 0 . 0 4 8 0 . 0 7 5 0 . 0 7 7 0 . 0 8 7 0 . 0 2 7 0 . 0 5 1 0 . 0 7 7 0 . 0 2 8 0 . 0 7 5 0 . 0 4 8 0 . 0 7 5 0 . 0 6 1 0 . 0 7 5 0 . 0 2 2 0 . 0 7 5 0 . 0 1 1 0 . 0 1 5 0 . 0 3 3 0 . 0 6 1 0 . 0
C Q N Q
|
2 3 2 2 0 0 0 3 0 3 2 4 6 1 2 4 0 2 2 4 0 2 2 4 1 2 2 1 2 4 1 2 1 2 2 2 6 1 0 6 7 2 4 2 0 6 2 2 4 2 0 6 2 2 8 8 8 4 0 6 2 2 0 3 3 3 0 3 0 3 0 3 0 3 6 1
t o r p Q
i l
1 0 3 6 ) 7 ) 7 ) 9 ) 5 ) 9 ) 5 ) 3 ) 4 ) 2 ) 3 ) 9 ) 6 ) 0 1 ) 0 1 ) 3 1 ) 3 1 ) 3 1 ) 3 1 ) 3 1 ) 4 1 ) 0 1 ) 6 1 ) 2 1 ) 1 1 ) 6 1 ) 4 1 ) 1 1 ) 6 1 ) 3 1 ) 4 1 ) 6 1 ) 3 1 ) 0 1 ) 4 1 ) 0 1 ) 0 1 ) 0 1 )
k n Q
c
Q
N Q
1 1 0 2 0 2 0 2 2 2 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 1 1 2 2 1 ) 1 ) 1 ) 1 ) 1 ) 1 ) 1 ) 1 1 5 ) 5 ) 7 ) 4 ) 7 ) 4 ) 7 ) 4 ) 7 ) 7 ) 6 ) 7 ) 7 ) 7 ) 8 ) 9 ) 8 ) 8 ) 8 ) 8 ) 5 ) 2 ) 5 ) 2 ) 5 ) 1 ) 5 ) 2 ) 5 ) 4 ) 1 1 ) 0 1 ) 0 1 ) 1 1 ) 0 1 ) 1 1 ) 0 1 ) 1 1 ) 1 1 ) 2 3 0 0 1 1 6 ) 6 ) 6 ) 4 ) 6 ) 4 ) 6 ) 5 ) 6 ) 5 ) 2 ) 3 ) 6 ) 3 ) 6 ) 3 ) 2 ) 2 ) 6 ) 3 ) 3 ) 6 ) 2 ) 3 ) 6 ) 2 ) 6 ) 6 ) 2 ) 6 ) 1 ) 3 ) 6 ) 1 ) 6 ) 6 )
. o N
. d e u n i t n o C
. 3
i
l
9 4 8 4 7 4 4 6 6 4 3 6 5 4 2 6 4 4 1 6 5 7 6 7 3 4 0 6 4 7 2 4 9 5 3 7 1 4 8 5 2 7 0 4 7 5 1 7 9 3 5 5 6 5 0 7 8 3 4 5 9 6 7 3 3 5 8 6 6 3 2 5 7 6 1 5 6 6 0 5 5 6
f 1 L C I t P
a 6 L C I t P
c 1 L C I t P
a 5 L C I t P
a 1 1 L C I t P
a 1 L C I t P
a 0 1 L C I t P
f 3 L C I t P
c 9 L C I t P
c 3 L C I t P
a 3 L C I t P
a 9 L C I t P
a 8 L C I t P
a 7 L C I t P
FEBS Journal 276 (2009) 1082–1093 Journal compilation ª 2009 FEBS. No claim to original French government works
1088
8 n e C t P g 1 L C I t P e 1 L C I t P b 6 L C I t P d 1 L C I t P b 5 L C I t P b 1 1 L C I t P b 1 L C I t P g 3 L C I t P b 0 1 L C I t P b 3 L C I t P 3 n e C t X d 9 L C I t P n e C o E e 3 L C I t P 3 n e C X d 3 L C I t P b 9 L C I t P 3 n e C m M 3 n e C n R 8 1 n e C t P b 8 L C I t P 3 n e C s H 5 1 n e C t P 2 1 n e C t P b 7 L C I t P 0 1 n e C t P n e t o r P e l b a T
L. Mouawad et al.
Compactness of calcium-binding proteins
0.2
30A
B
25
0.15
20
0.1
15
Q d
t o r p Q
0.05
10
0
5
0
–0.05
0
10
20
60
70
80
0
10
20
60
70
80
50 40 30 Protein number
30 40 50 Protein number
C
D
4
200
Helix E
Helix D
3
Linker
2
150
1
0
100
| C Q N Q
|
–1
y t i c i b o h p o r d y H
–2
50
–3
–4
0
0
10
20
60
70
80
–30
–20
–10
10
20
30
0 Relative residue number
30 40 50 Protein number
Fig. 3. Test of the four criteria used in the SOSUIDUMBBELL algorithm. (A) The absolute net charge (|Qprot|) of investigated proteins without cal- cium ions versus the protein number from Table 3. The horizontal line corresponds to the limit of net charge between extended (|Qprot| > 20) and non-extended structures (|Qprot| £ 20) as considered by Uchikoga et al. [11]. Vertical lines delimit between the known extended struc- tures (filled circles), the known non-extended structures (open diamonds) and the unknown structures of centrins (filled triangles). It can be seen that two extended structures are mispredicted (1GGZ and 2GGM) and that all the unknown-structure centrins are predicted to be non- extended. (B) The absolute net charge density (dQ) with a horizontal line limit at 0.14. (C) The absolute value of the product of the two domain charges (|QNQC|) in the absence of calcium ions with a horizontal line limit at 100. In this case, only tropnin C molecules are pre- dicted to be extended. (D) The hydrophobicity profile of the linker region and its surroundings using the Kyte & Doolitle Scale for two extended structures (dotted lines, 1OSA and dashed line, 4TNC) and for a non-extended one (solid line, 1REC). For convenience of compari- son, the three sequences were renumbered and centered on the linker. The zero point corresponds to residue number 92 in 4TNC, 81 in 1OSA and 98 in 1REC, which represents the center of the linker in each case.
the
E) are hydrophobic. In the present study, the same calculations were applied to all known-structure pro- teins, and it was observed that, in some cases, non- extended proteins (e.g. recoverin; 1REC) present the same hydropathy profile around the linker as extended proteins, such as calmodulin or troponin C (1OSA and criteria 4TNC; Fig. 3D). Therefore, none of retained in the sosuidumbbell algorithm are com- pletely reliable to predict the form of the CaBPs. This motivated our search for other criteria.
hydrophobic. In most compact structures, a trypto- phan (or less frequently a phenylalanine) located in one domain was buried in a hydrophobic cavity in the other domain, which would stabilize the compact structure. Unfortunately, this observation cannot be used as a predictive tool starting from the sequence because the aromatic residue is not located in a specific part of it. Indeed, the sequence of the linker and its close vicinity (three more residues from each side of the linker) does not always contain tryptophan or phenylalanine residues for compact forms (see 1REC, 1JBA, 1BJF and 2SCP in Table 3).
Utilization of other criteria
The presence of helix breakers
Contact area We analyzed the contact area between the domains of known-structure non-extended CaBPs. As expected, most of the residues at the interface were found to be
Prolines and, to a lesser extent, glycines, are well- known helix breakers. We investigated the presence of
FEBS Journal 276 (2009) 1082–1093 Journal compilation ª 2009 FEBS. No claim to original French government works
1089
L. Mouawad et al.
Compactness of calcium-binding proteins
2.5
AtCen
ICL11
2
Cen2 Cen1 Algae
ICL3a
ICL1a
1.5
ICL5
ICL10
1
ICL1e
H A L
Cdc31
ICL7
ICL3b
0.5
ICL8
ICL9
0
–0.5
0
10
20
60
70
80
such residues in the linker or its vicinity (i.e. plus three residues from each side of the linker). The results pre- sented in Table 3 show that, as expected, the presence of a Pro yields a non-extended form by breaking the central helix that constitutes the linker, but the reverse is not true because all the compact CaBPs do not con- tain a Pro in the linker. Therefore, this criterion can- not constitute a predictive rule. Moreover, concerning glycines, it was observed that, in both troponin C pro- teins (4TNC and 1TN4), which are extended, there is one Gly in the linker, as in bovine recoverin (1REC), guanylate cyclase activating protein 2 (1JBA) and bovine neurocalcin d (1BJF), which present very com- pact structures.
50 40 30 Protein number
Net electric charge of the linker
Fig. 4. The LAH for the investigated proteins. The horizontal line delimits between the predicted extended structures (LAH > 1.4) lines and the predicted non-extended ones (LAH £ 1.4). Vertical delimit between the known extended structures (filled circles), the known non-extended structures (open diamonds) and the unknown structures of centrins (filled triangles). For the unknown-structure centrins, we indicate the phylogenetic subfamilies.
It might be assumed that the net electric charge of the linker plays a role if there is repulsion between this lin- ker and the adjacent domains. Thus, this property was investigated (Table 3) but did not yield a good discrim- inating criterion because, in HsCen2 (2GGM), which is extended, the linker is neutral as in bovine neurocal- cin d (1BJF) or amphioxus sarcoplasmic calcium-bind- ing protein (2SAS), which are non-extended structures.
Hydrophilicity of the linker
The criterion that yielded the best results was based on the hydrophilicity of the linker. It was obtained by the procedure detailed below. First, the hydrophilicity (hi) of each residue i of the protein was calculated using the Hopp & Woods Scale [27] with a nine-residue slid- ing window. In this scale, positive values correspond to hydrophilic positions.
Second,
extended forms and < 1.2 for the others. Therefore, an average value of 1.4 was considered as the thresh- old above which a two-domain EF-hand protein is extended. Moreover, one of the reviewers of the pres- ent study suggested the case of calcineurin B-like pro- tein 2 from Arabidopsis (SwissProt code: Q8LAS7, PDB code: 1UHN), which we omitted to consider in our sample. The protein consists of 226 residues and the linker of five residues (residues 117–121). The cal- culated LAH value is 0.2978, predicting a compact structure in good agreement with the 3D structure of the protein. Considering centrins with unknown struc- tures, it can be seen that the LAH values reflect well the phylogenetic classification, although this classifica- tion is based on the entire sequence, whereas LAH is based on only few residues in the linker region.
the linker was determined as described above: if the last residue of the calcium-binding loop II is denoted J and the first residue of the calcium-bind- ing loop III is denoted K, the linker consists of all resi- dues comprised in the interval ]J + 10, K ) 10[.
Finally, the LAH was calculated:
X
LAH ¼
hi n
i2 Jþ10;K(cid:3)10
(cid:4)
½
where n is the number of residues in the linker and hi is the hydrophilicity at position i of the linker.
To determine whether the discrimination potency of the linker average hydrophilicity is fortuitous or not, LAH values were reported versus the radius of gyra- tion of the known structures in Fig. 5. A clear correla- tion is demonstrated between these two features, with a correlation coefficient equal to 0.82 and a Student coefficient of 36.98 (for 16 degrees of freedom that cor- respond to 17 points), indicating that the probability of this correlation to be random is < 0.001. The LAH algorithm is available at: http://u759.curie.u-psud.fr/ modelisation/LAH.
This procedure was applied to all proteins in Tables 1 and 2. The results are presented in Fig. 4. Remarkably, the LAH values discriminated well between the extended and non-extended forms of the known structures of the CaBPs, with two distinct sets of points, where LAH was greater than 1.6 for the
The predictive potency of the present method the linker limits, depends on the determination of which must be defined objectively. To find such a defi- nition, several delimitations were tested, including the
FEBS Journal 276 (2009) 1082–1093 Journal compilation ª 2009 FEBS. No claim to original French government works
1090
L. Mouawad et al.
Compactness of calcium-binding proteins
22
)
Å
(
the PtICL family is divided into two sets: the extended proteins (ICL1a, ICL3a and ICL11 subfamilies) and the non-extended ones (ICL1e, ICL3b, ICL5, ICL7, ICL8, ICL9 and ICL10 subfamilies).
20
n o i t a r y g
Conclusions
18
f o s u i d a R
16
–0.5
0
0.5
1.5
2
1 LAH
Fig. 5. The radius of gyration of the known-structure CaBPs versus their LAH. The straight line shows the linear fit of the points. The correlation coefficient is 0.82.
The results obtained in the present study indicate that the extended and compact forms of EF-hand proteins do not necessarily depend on the electric charge of the domains, but they are mainly determined by the hydro- philicity (as determined by the Hopp & Woods Scale) of the residues that link the two domains. The definition of the linker is very important and should not include residues from the adjacent helices. What is remarkable is that, once the linker is defined objectively, the nature of its residues appears to determine the form of the CaBP, whatever the length of this linker; it can be as long as 20 residues, as in calcineurin B homologous protein 1, 2CT9 (representing approximately 10% of the protein length; Table 3), or as short as two residues, as in P. tetraurelia infraciliary lattice centrins 9, PtICL9 (< 1% of the protein length). However, the length of the linker in the set of proteins considered in the present study is approximately five residues on average, which is rather short. This indicates that the form of CaBPs is likely governed by short-distance interactions.
Experimental procedures
Seventeen CABPs with known structures, two of them com- prising centrins, in addition to 59 centrins with unknown structures, were considered in the present study.
Choice of the proteins
CaBPs with known structures were taken from the PDB [12]. Only proteins containing four EF-hand motifs were considered. The chosen structures had to obey to several criteria.
sosuidumbbell tool. We have one used in the observed that considering long linkers, which overlap adjacent helices, does not allow us to discriminate between the different forms of CaBPs because the results were polluted by the nature of the extra resi- dues, whereas the shortest possible linkers provided the most reliable way to discriminate between the extended and compact forms. However, it must be noted that the influence of four neighboring residues at both ends of the linker are taken indirectly into account because of the nine-residue window used in the calculations of hydrophilicity. Raw hydrophilicity data (equivalent to a one-residue window) were also tested to check the importance of this influence. The results were qualitatively similar to those obtained with the nine-residue window with respect to the prediction of the form of the protein, but the correlation between LAH and the radius of gyration was less evident. Moreover, this discrimination was possible when calcu- lating LAH with the Hopp & Woods Scale for hydrop- athy. Three other scales were tested (Kyte & Doolittle [26], Miyazawa & Jernigen [28] and Janin [29]) but did not provide satisfactory results. This is mainly due to the scores attributed to the Asn, Gln and Trp residues, which are considered to be much more hydrophilic in these scales than in the Hopp & Woods Scale.
First, the proteins had to be in their unbound state (i.e. not in complex with their target peptides because peptide binding may cause conformational changes of the entire protein). There were, however, two exceptions: human cen- trin 2 (2GGM) and yeast centrin (2DOQ), in which the peptide interacts with only one domain (C-domain) and therefore does not modify the relative position of the two domains. In addition, these two structures were the only ones available in the PDB for this family of proteins.
Second, the EF-hand proteins, which had an extended structure resolved by NMR, were discarded because they did not provide enough information concerning the relative positions of their domains.
Applying the LAH method to centrins showed that the CrCen-like proteins are predicted to be extended, which is in good agreement with the known structure of one member of this family, HsCen2 [14,25]. The Cdc31-like family is predicted to be in the non- extended form, which is also in good agreement with the known structure of ScCdc31 [23]. There are no experimental information about the other centrins, but we predict that members of the AtCen family are in an extended form, similar to the CrCen family, and that
FEBS Journal 276 (2009) 1082–1093 Journal compilation ª 2009 FEBS. No claim to original French government works
1091
L. Mouawad et al.
Compactness of calcium-binding proteins
Third, only three families of known extended structures of CaBPs were found in the PDB: troponin C, calmodulin and human centrin 2. The chosen structures in each family had to share the least possible sequence identity. The most divergent ones shared between 78% and 90% identity. However, the sequence identity between the three families did not exceed 50%.
parameters were conserved (i.e. the window size was equal to nine residues, with a relative weight of the window edges compared to its center equal to 100%). The scale was not normalized. In more detail, the ‘smoothed’ hydrophilicity hi was calculated for each residue i of the protein by averag- ing the raw hydrophilicities over the residues of the sliding window (here i ) 4 to i + 4). Then, to obtain LAH, only the values for the linker residues were averaged again and taken into account.
Fourth, all the non-extended structures constituted of two domains, with a linker containing more than one residue, were kept. They share between 5% and 50% sequence iden- tity.
Several other hydrophobicity scales were also used (which are available on the ExPASy server) either for com- parison with the Hopp & Woods Scale or for verification the sosuidumbbell criteria. Kyte & Doolittle [26], of Miyazawa & Jernigen [28] and Janin [29] scales were used with the same default parameters.
This left a set of 17 CaBPs with known structures: seven extended and ten non-extended forms. It should be noted that one of these structures is a mutant protein, the rabbit troponin C (1TN4) [30], where Cys98 was replaced by Leu. This residue is located in helix E and does not modify the extended structure of the protein.
Acknowledgements
support of
Concerning the unknown CaBP structures, we considered the three well-characterized phylogenetic families of centrin, in addition to all the PtICLs. Inside each family (or sub- family for the PtICL), the sequence identity was in the range 60–98%, whereas, between different families, it was in the range 11–50%.
This work was supported by the Institut National de la Sante´ et de la Recherche Me´ dicale (INSERM) and the Institut Curie. We acknowledge the finan- the EGIDE (ECO-NET project cial 16342RH) and a FEBS short-term fellowship for A. Isvoran.
Sequence alignments
References
1 Carafoli E (2002) Calcium signaling: a tale for all
seasons. PNAS 99, 1115–1122.
2 Babu YS, Bugg CE & Cook WJ (1988) Structure of
calmodulin refined at 2.2 A resolution. J Mol Biol 204, 191–204.
3 Nalefski EA & Falke JJ (1996) The C2 domain calcium-binding motif: structural and functional diversity. Protein Sci 5, 2375–2390.
Sequence alignments were used to identify the calcium loops and therefore to delimit the linker as described in the Results and Discussion. They were performed with clustalw [31] (http://www.ebi.ac.uk/Tools/clustalw2/ index.html). Known structures were aligned together using the default settings. Unknown structures were aligned sepa- rately by families (four distinct families and four distinct alignments). In each alignment, there were all possible sub- families in addition to calmodulin to help identify calcium loops. No structural alignments were taken into account.
4 Krebs J & Heizmann CW (2007) Calcium binding
Form prediction
proteins and the EF-hand principles. In Calcium: A Matter of Life or Death (Krebs J & Michalak M, eds), pp. 49–132. Elsevier BV, Oxford.
5 Raynald P & Pollard HB (1994) Annexins: the problem of assessing the biological role for a gene family of mul- tifunctional calcium- and phospholipid-binding proteins. Biochim Biophys Acta 1197, 63–93.
6 Carafoli E (2003) The calcium signaling saga: tap water
and protein crystals. Nature 4, 326–332.
7 Chard PS, Bleakman D, Christakos S, Fullmer CS &
To predict the form of the structures (extended or not), all the sequences of our CaBP sample were introduced in the sosuidumbbell algorithm [11]. Then, to analyze the rea- sons for the misprediction of some structures, we used bespoke software that was based on the same criteria as the sosuidumbbell algorithm. Similar to the latter, only the electric charge of basic and acidic residues (in addition to His) was taken into account, but not the charge of the N- and C-termini of the protein.
Miller RJ (1993) Calcium buffering properties of calbin- din D28k and parvalbumin in rat sensory neurones. J Physiol 472, 341–357.
Calculation of the hydrophilicity
The hydrophilicity of each protein was calculated using the Hopp & Woods Scale [27] available on the ExPASy server [32] (http://us.expasy.org/tools/protscale.html). The default
8 Christova P, Cox JA & Craescu CT (2000) Ion-induced conformational and stability changes in Nereis sarco- plasmic calcium binding protein: evidence that the APO state is a molten globule. Proteins Struct Funct Bioinfor- matics 40, 177–184.
FEBS Journal 276 (2009) 1082–1093 Journal compilation ª 2009 FEBS. No claim to original French government works
1092
L. Mouawad et al.
Compactness of calcium-binding proteins
9 Satyshur KA, Rao ST, Pyzalska D, Drendel W, Greaser
22 Gogendeau D, Beisson J, Garreau de Loubresse N, Le Caer JP, Ruiz F, Cohen J, Sperling L, Koll F & Klotz C (2007) A sfi1p-like centrin-binding protein mediates centrin based Ca2+ dependent contractility in Parame- cium. Eucaryot Cell 6, 1992–2000.
23 Li S, Sandercock AM, Conduit PT, Robinson CV,
Williams RL & Kilmartin JV (2006) Structural role of Sfi1p-centrin filaments in budding yeast spindle pole body duplication. J Cell Biol 73, 867–877.
M & Sundaralingam M (1988) Refined structure of chicken skeletal muscle troponin C in the two-calcium state at 2A˚ resolution. J Biol Chem 263, 1628–1647. 10 Humphrey W, Dalke A & Schulten K (1996) VMD – visual molecular dynamics. J Mol Graph 14, 33–38. 11 Uchikoga N, Takahashi SY, Ke R, Sonoyama M & Mitaku S (2005) Electric charge balance mechanism of extended soluble proteins. Protein Sci 14, 74–80. 12 Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN & Bourne PE (2000) The Protein Data Bank. Nucleic Acids Res 28, 235– 242.
24 Yang A, Miron S, Duchambon P, Assairi L, Blouquit Y & Craescu CT (2006) The N-terminal domain of human centrin 2 has a closed structure, binds calcium with a very low affinity and plays a role in the protein self-assembly. Biochemistry 45, 880–889.
25 Charbonnier JB, Renaud E, Miron S, Le Du MH,
13 Han BG, Han M, Sui H, Yaswen P, Walian PJ & Jap BK (2002) Crystal structure of human calmodulin-like protein: insights into its functional role. FEBS Lett 521, 24–30.
14 Thompson JR, Ryan ZC, Salisbury JL & Kumar R
Blouquit Y, Duchambon P, Christova P, Shosheva A, Rose T, Angulo JF et al. (2007) Structural, thermody- namic, and cellular characterization of human centrin 2 interaction with xeroderma pigmentosum group C protein. J Mol Biol 373, 1032–1046.
(2006) The structure of the human centrin 2-xeroderma pigmentosum group C protein complex. J Biol Chem 281, 18746–18752.
15 Schiebel E & Bornens M (1995) In search of a function
26 Kyte J & Doolittle RF (1982) A simple method for displaying the hydropathic character of a protein. J Mol Biol 157, 105–132.
for centrins. Trends Cell Biol 5, 197–201.
27 Hopp TP & Woods KR (1981) Prediction of protein antigenic determinants from amino acid sequences. PNAS 78, 3824–3828.
28 Miyazawa S & Jernigen RL (1985) Estimation of effec-
16 Zamora I & Marchall FW (2005) A mutation in the centriole-associated protein centrin causes genomic instability via increased chromosome lost in Chlamydo- monas reinhardtii. BMC Biol 3, 15–22.
17 Salisbury JL, Suino KM, Busby R & Springett M
tive inter-residue contact energies from protein crystal structures: quasi-chemical approximation. Macromole- cules 18, 534–552.
(2002) Centrin-2 is required for centriole duplication in mammalian cells. Curr Biol 12, 1287–1292.
29 Janin J (1979) Surface and inside volumes in globular
18 Wiech H, Geier BM, Paschke T, Spang A, Grein K,
proteins. Nature 277, 491–492.
Steinkotter J, Melkonian M & Schiebel E (1996) Char- acterization of green algae, yeast and human centrins. J Biol Chem 271, 22453–22461.
30 Houdusse A, Love ML, Dominguez R, Grabarek Z & Cohen C (1997) Structures of four Ca2+-bound tropo- nin C at 2.0 A˚ resolution: further insights into the Ca2+-switch in the calmodulin superfamily. Structure 5, 1695–1711.
31 Thompson JD, Higgins DG & Gibson TJ (1994)
19 Resendes KK, Rasala BA & Forbes DJ (2008) Centrin 2 localizes to the vertebrate nuclear pore and plays a role in mRNA and protein export. Mol Cell Biol 28, 1755–1769.
CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weight- ing, positions-specific gap penalties and weight matrix choice. Nucleic Acids Res 22, 4673–4680.
32 Gasteiger E, Hoogland C, Gattiker A, Duvaud S, Wil-
20 Wolfrum U, Giebl A & Pulvermuller A (2002) Centrins, a novel group of Ca2+-binding proteins in vertebrate photoreceptor cells. In Photoreceptors and Calcium (Baehr W & Palczewski K, eds), pp. 155–178, Kluwer Academic ⁄ Plenum Publishers, New York, NY.
21 Azimzadeh J & Bornens M (2004) The centrosome in evolution. In Centrosome in Development and Disease (Nigg EA, ed.), pp. 93–112. Wiley-VCH Verlag GmbH&Co KGaA, Weinheim.
kins MR, Appel RD & Bairoch A (2005) Protein identi- fication and analysis tools on the ExPASy server. In The Proteomics Protocols Handbook (Walker JM, ed), pp. 571–607. Humana Press, Totowa, NJ.
FEBS Journal 276 (2009) 1082–1093 Journal compilation ª 2009 FEBS. No claim to original French government works
1093