M I N I R E V I E W
The RNA recognition motif, a plastic RNA-binding platform to regulate post-transcriptional gene expression Christophe Maris*, Cyril Dominguez* and Fre´ de´ ric H.-T. Allain
Institute for Molecular Biology and Biophysics, Swiss Federal Institute of Technology Zurich, ETH-Ho¨ nggerberg, Zu¨ rich, Switzerland
Keywords RNA recognition motif; protein–RNA complex; structure–function relationship; RNA-binding specificity
Correspondence F. H.-T. Allain, Institute for Molecular Biology and Biophysics, Swiss Federal Institute of Technology Zurich, ETH- Ho¨ nggerberg, CH-8093 Zu¨ rich, Switzerland Fax: +41 1 6331294 Tel: +41 1 6333940 E-mail: allain@mol.biol.ethz.ch Website: http://www.mol.biol.ethz.ch/ groups/allain_group
*These authors contributed equally to the work
(Received 16 December 2004, accepted 7 March 2005)
doi:10.1111/j.1742-4658.2005.04653.x
The RNA recognition motif (RRM), also known as RNA-binding domain (RBD) or ribonucleoprotein domain (RNP) is one of the most abundant protein domains in eukaryotes. Based on the comparison of more than 40 structures including 15 complexes (RRM–RNA or RRM–protein), we reviewed the structure–function relationships of this domain. We identified and classified the different structural elements of the RRM that are import- ant for binding a multitude of RNA sequences and proteins. Common structural aspects were extracted that allowed us to define a structural leit- motif of the RRM–nucleic acid interface with its variations. Outside of the two conserved RNP motifs that lie in the center of the RRM b-sheet, the two external b-strands, the loops, the C- and N-termini, or even a second RRM domain allow high RNA-binding affinity and specific recognition. Protein–RRM interactions that have been found in several structures rein- force the notion of an extreme structural versatility of this domain support- ing the numerous biological functions of the RRM-containing proteins.
History – what defines an RRM?
Abbreviations ACF, APOBEC-1 complementary factor; CBP, cap binding protein; CstF, cleavage stimulation factor; hnRNP, heterogeneous nuclear ribonucleoprotein; HuD, Hu protein D; LRR, leucine rich repeat; MIF4G, middle domain of the translation initiation factor 4 G; PABP, polyadenylate binding protein; PIE, polyadenylation inhibition element; PTB, polypyrimidine tract binding protein; RBD, RNA-binding domain; RNP, ribonucleoprotein; RRM, RNA recognition motif; SR, serine/arginine rich proteins; TLS, translocated in liposarcoma; U1A, U2A¢, U2B¢: U1 snRNP proteins A, A¢, B¢; U2AF, U2 snRNP auxiliary factor; UHM, U2AF homology motif; UPF, up-frameshift protein.
FEBS Journal 272 (2005) 2118–2131 ª 2005 FEBS
2118
The RNA recognition motif (RRM), also known as the RNA-binding domain (RBD) or ribonucleopro- tein domain (RNP), was first identified in the late 1980s when it was demonstrated that mRNA precur- sors (pre-mRNA) and heterogeneous nuclear RNAs (hnRNAs) are always found in complex with proteins (reviewed in [1]). Biochemical characterizations of the mRNA polyadenylate binding protein (PABP) and the hnRNP protein C shed light on a consensus RNA-binding domain of approximately 90 amino acids containing a central sequence of eight con- served residues that are mainly aromatic and posi- tively charged [2,3]. This sequence, termed the RNP consensus sequence, was thought to be involved in RNA interaction and was defined as Lys ⁄ Arg- Gly-Phe ⁄ Tyr-Gly ⁄ Ala-Phe ⁄ Tyr-Val ⁄ Ile ⁄ Leu-X-Phe ⁄ Tyr, where X can be any amino acid. Later, a second consensus sequence less conserved than the previously characterized one [1] was identified. This six residue sequence located at the N-terminus of the domain
C. Maris et al.
The RRM domain, a plastic RNA-binding platform
RNP2
RNP1
10 20 30 40 50 60 70 80 PTB (1SJQ) 60 VIHIRKLPIDVTEGEVISLGLP-----FGKVTNL------LMLKG-----KNQAFIEMNTEEAANTMVNYYTSVTPVLRGQPIYIQ 147 PTB (1SRJ) 183 RIIVENLFYPVTLDVLH-QIFSK----FGTVLKI-----ITFTKNN----QFQALLQYADPVSAQHAKLSLDGQNIYNACCTLRID 282 PTB (1QM9) 338 VLLVSNLNPERVTPQSLFILFGV----YGDVQRV-----KILFNK-----KENALVQMADGNQAQLAMSHLNGHKLH--GKPIRIT 407 PTB (1QM9) 455 TLHLSNIPPSVSEEDLK-VLFSS----NGGVVKG-----FKFFQKD----RKMALIQMGSVEEAVQALIDLHNHDLG-ENHHLRVS 531 Cstf-64 (1P1T) 17 SVFVGNIPYEATEEQLK-DIFSE----VGPVVSF-----RLVYDRETGKPKGYGFCEYQDQETALSAMRNLNGREFS--GRALRVD 90 LA (1OWX) 244 LKFSGDLDDQTCREDLHILFSNH----GEIK--------WIDFVRGA--KEGIILFKEKAKEALGKAKDANNGNLQLRNKEVTWEV 305 TAP (1FO1) 121 KITIPYGRKYDK-AWLLSMIQSKCSVPFTPIEFHYENTRAQFFVEDASTASALKAVNYKILDRENRRISIIINSSAP----PHS 290 ALY (1NO8) 106 KLLVSNLDFGVSDADIQ-ELFAE----FGTLKKA-----AVHYDRSGR-SLGTADVHFERKADALKAMKQYNGVPLD--GRPMNIQ 178 hnRNP A1 (1UP1) 15 KLFIGGLSFETTDESLR-SHFEQ----WGTLTDC-----VVMRDPNTKRSRGFGFVTYATVEEVDAAMNARP-HKVD--GRVVEPK 87 hnRNP A1 (1HA1) 105 KIFVGGIKEDTEEHHLR-DYFEQ----YGKIEVI-----EIMTDRGSGKKRGFAFVTFDDHDSVDKIVIQKY-HTVN--GHNCEVR 177 HUD (1FXL) 47 NLIVNYLPQNMTQEEFR-SLFGS----IGEIESC-----KLVRDKITGQSLGYGFVNYIDPKDAEKAINTLNGLRLQ--TKTIKV 119 HUD (1FXL) 133 NLYVSGLPKTMTQKELE-QLFSQ----YGRIITS-----RILVDQVTGVSRGVGFIRFDKRIEAEEAIKGLNGQKPSGATEPITVK 206 SXL (2SXL) 126 NLIVNYLPQDMTDRELY-ALFRA----IGPINTC-----RIMRDYKTGYSYGYAFVDFTSEMDSQRAIKVLNGITVR--NKRLKV 199 SXL (1SXL) 212 NLYVTNLPRTITDDQLD-TIFGK----YGSIVQK-----NILRDKLTGRPRGVAFVRYNKREEAQEAISALNNVIPEGGSQPLSVR 290 PABP (1CVJ) 12 SLYVGDLHPDVTEAMLY-EKFSP----AGPILSI-----RVCRDMITRRSLGYAYVNFQQPADAERALDTMNFDVIK--GKPVRI 84 PABP (1CVJ) 99 NIFIKNLDKSIDNKALYDTFSAF----GNILSCK------VVCDENGSKGYGFVHFETQEAAERAIEKMNGMLLNDRKVFVGRFKS 175 Nucleolin (1FJE) 309 NLFIGNLNPNKSVAELKVAISEL----FAKND-------LAVVDVRTGTNRKFGYVDFESAEDLEKAL-ELTGLKVF--GNEIKLE 380 Nucleolin (1FJE) 396 LLAKNLSFNITEDELKEVFEDAL----EIRLVSQ----------DGKSKGIAYIEFKS--EADAEKNLEEKQGAEID--GRSVSLY 463 U1A (1DZ5) 11 TIYINNLNEKIKKDELKKSLYAI----FSQFGQI-----LDILVSRSLKMRGQAFVIFKEVSSATNALRSMQGFPFY--DKPMRIQ 85 U2B" (1A9N) 8 TIYINNMNDKIKKEELKRSLYAL----FSQFGHV-----VDIVALKTMKMRGQAFVIFKELGSSTNALRQLQGFPFY--GKPMRI 81 CBP20 (1H2T) 41 TLYVGNLSFYTTEEQIY-ELFSK----SGDIKKI-----IMGLDKMKKTACGFCFVEYYSRADAENAMRYINGTRLD--DRIIRTD 114 Y14 (1P27) 74 ILFVTGVHEEATEEDIH-DKFAE----YGEIKNI-----HLNLDRRTGYLKGYTLVEYETYKEAQAAMEGLNGQDLM--GQPISVD 147 UPF3 (1UW4) 52 KVVIRRLPPTLTKEQLQEHLQPM----PEHDYFE----FFSNDTSLYPHMYARAYINFKNQEDIILFRDRFDGYVFLDNKGQEYPA 131 U2AF65 (1U2F) 150 RLYVGNIPFGITEEAMM-DFFNAQMR-LGGLTQAPG---NPVLAVQINQDKNFAFLEFRSVDETTQAM-AFDGIIFQ--GQSLKIR 227 U2AF65 (2U2F) 260 KLFIGGLPNYLNDDQVK-ELLTS----FGPLKAF-----NLVKDSATGLSKGYAFCEYVDINVTDQAIAGLNGMQLG--DKKLLVQ 333 U2AF35 (1JMT) 66 RSAVSDVEMQEHYDEFFEEVFTEMEEKYGEVEEM-----NVC-DNLGDHLVGNVYVKFRREEDAEKAVIDLNNRWFN--GQPIHA 143
L1
L2
L3
L4
L5
β1
α1
β2
β3
α2
β4
Fig. 1. Sequence alignment of a selection of RRM domains for which the structure has been solved (PDB codes are indicated in brackets). The alignment was generated by the program CLUSTALW (http://www.ebi.ac.uk/clustalw/) [55] and manually optimized. The conserved RNP 1 and RNP 2 sequences are displayed in yellow. The amino acids highlighted in boxes refer to the aromatic residues important for primary RNA binding.
range of [8]. was defined as Ile ⁄ Val ⁄ Leu-Phe ⁄ Tyr-Ile ⁄ Val ⁄ Leu-X- Asn-Leu. The first consensus sequence was therefore referred as RNP 1 and the second as RNP 2 (Fig. 1). It was then shown that this protein domain was necessary and sufficient for binding RNA molecules specificities and affinities with a wide (reviewed in [4–6]). Here we review the structural properties of
the RRM domain in its isolated form and in complex with RNAs and ⁄ or proteins. This review shows how such a simple domain can modulate its fold to recognize many RNAs and proteins in order to achieve a multi- tude of biological functions often associated with post- transcriptional gene regulation.
An abundant and ancient fold with multiple biological functions
FEBS Journal 272 (2005) 2118–2131 ª 2005 FEBS
2119
Genome sequencing projects recently showed that the RRM is found abundantly in all life kingdoms, inclu- ding prokaryotes and viruses although at lower abun- dance than in eukaryotes. To date, only 85 proteins containing an RRM domain in bacteria (mostly cyano- bacteria [7]), and six such proteins in viruses have been identified. Prokaryotic RRM proteins are rather small (about 100 amino acids) and have a single copy of the RRM domain. In eukaryotes, the RNA recognition motif is one of the most abundant protein domains. To date, a total of 6056 RRM motifs have been identi- (http://www.sanger. fied in 3541 different proteins In humans, ac.uk/cgi-bin/Pfam/getacc?PF00076) 497 proteins containing at least one RRM have been identified. Assuming about 20 000–25 000 human genes, the RRM would therefore be present in about 2% of gene products. In eukaryotic proteins, RRMs are often found as multiple copies within a protein (44%, two to six RRMs) and ⁄ or together with other domains (21%). Among the latter, the most abundant are the zinc fingers of the CCCH and CCHC type (21% of those with an additional domain), the poly- adenylate binding protein C-terminal domain (PABP or PABC, 10%), and the WW domain (9%). Interest- ingly, contrary to the well known CCHHs that bind the CCCH and double-stranded DNA or RNA, CCHC zinc fingers are domains that bind single-stran- ded RNA [9,10]. The PABP and the WW domains [11] are protein–protein interaction domains involved in formation, translation [12,13] and pre-spliceosome respectively [14]. By association with different types of protein domains, the RRM domain can modulate its
C. Maris et al.
The RRM domain, a plastic RNA-binding platform
RNA-binding affinity and specificity and diversify its biological functions.
the RRM are in the hydrophobic core of the domain [17] except four conserved residues that contribute to RNA binding, namely RNP 1 positions 1, 3 and 5 and RNP 2 position 2 (see the following section and Fig. 1). The RNP 1 and RNP 2 motifs are located in the central strands of the b-sheet, namely b3 and b1, respectively, and are highly conserved apart from a few RRM domains such as ALY and TAP (Fig. 1) [18,19].
A protein domain in such abundance is necessarily biologically important and associated with many func- tions in the cell. Indeed, eukaryotic RRM proteins are present in all post-transcriptional events: pre-mRNA processing (for example CstF-64, LA, or UPF3 pro- teins), splicing (U2B¢, U2AF35, U2AF65, hnRNPA1 or Y14 proteins), alternative splicing (hnRNPA1, PTB, sex-lethal, SR proteins), mRNA stability (CBP20, PABP or HuD), RNA editing (ACF), mRNA export (TLS), pre-rRNA complex formation (nucleolin), translation regulation (PABP) and degradation [6]. In plants, RRM proteins are present in chloroplasts and are involved in 3¢ end processing of chloroplast mRNA [15]. They have also been discovered in plant mito- chondria. Their functions, however, remain unclear [16]. Similarly, their roles in bacteria and viruses are still unknown. The numerous three-dimensional struc- tures of the RRM in isolation, and in complex with RNA or other proteins, shed light on the function of RRM proteins, as shown below.
lies on the b-sheet
The structure of the RRM, a babbab fold with some variations and extensions
Fig. 2. hnRNPA1 RRM 2, a typical RRM fold and its structural variations as illustrated by these different protein structures (hnRNPA1 RRM 2 [52], PTB RRM 3 [23], La C-terminal [20], Cst64 RRM [22] and U2AF35 [51]). This figure was generated with the program MOLMOL [56].
FEBS Journal 272 (2005) 2118–2131 ª 2005 FEBS
2120
The RRM folds into an ab sandwich structure with a b1a1b2b3a2b4 topology (Figs 1 and 2) as demonstrated by the first structure of an RNA recognition motif, the N-terminal RRM of U1A [17]. The fold is com- posed of one four-stranded antiparallel b-sheet spa- cially arranged in the order b4b1b3b2 from left to right when facing the sheet (Fig. 2, hnRNP A1-RRM 2, front view) and two a-helices (a1 and a2) packed against the b-sheet. Most of the conserved residues of To date, more than 30 RRM structures have been determined either by NMR or X-ray crystallography and reveal unexpected variations as shown in Fig. 2. The loops between the secondary structure elements (loops 1–5 as indicated in Figs 1 and 2) can have different lengths and are often disordered in the free form. An exception to this is loop 5 that often forms a small two-stranded b-sheet (b3¢ and b3¢) (Fig. 2). The N- and C-terminal regions, outside the RRM, are usually poorly ordered in the isolated domains with a few exceptions where they can adopt a secon- dary structure (Fig. 2, PTB-RRM 3, La C-terminal RRM and CstF-64). In the structures of La C-ter- minal RRM [20], U1A N-terminal RRM [21] and CstF-64 RRM [22], the C-terminus forms an a-helix that surface, while in PTB- RRM 2 and 3 it extends the size of the b-sheet by forming an extra b-strand (b5) antiparallel to b2 [23,24]. CstF-64 RRM has also an additional short a-helix in its N-terminal region (Fig. 2) [22]. Finally, secondary structure elements of the domain can be for example a-helix 1 in U2AF35 RRM modified; that is three times longer than in a canonical RRM (Fig. 2). This unusual helix 1 is involved in protein– protein interactions [25] (see the RRM–protein com- plexes section).
C. Maris et al.
The RRM domain, a plastic RNA-binding platform
A true single-stranded nucleic acid binding domain
Since the first structure of an RRM in complex with RNA (the N-terminal domain of U1A in complex with U1snRNA stem-loop II [26]) that founded our under- standing of RRM–RNA recognition, 10 structures of RRMs in complex with RNA or DNA (for hnRNPA1) have been determined either by NMR [27–30] or X-ray crystallography [31–36]. All of the structures present intrinsic common features and dif- ferences in RNA recognition reflecting the remarkable adaptability of this domain in order to achieve high affinity and specificity.
strands,
position 3) interacts hydrophobically with the sugar rings of A209 and G210. Finally, a positively charged side chain (Arg146, RNP 1 position 1) forms a salt bridge with the phosphate between A209 and G210. in This small set of RRM–nucleic acid interactions, the center of the domain, involving four conserved protein side chains of the RRM consensus sequence and two nucleotides, illustrates the perfect adaptation the RRM for effectively binding single-stranded of nucleic acids of any sequence. Indeed, the essential chemical elements of this dinucleotide, namely the two bases, the two sugar rings and the phosphates in between, are recognized. The two bases are stacked on conserved aromatic rings, and correspondingly, RNP 2 position 5 and RNP 1 position 2 are planar residues (Phe, Tyr, His or Trp) in 78% and 72% of the 70 RRMs studied by Birney et al. [6], respectively. The two sugar rings are in contact with a hydrophobic side chain (RNP 1 position 3) that is present in 81% (67% of Phe or Tyr) of the RRMs and finally the negatively charged phosphodiester group is neutralized by a posi- tively charged side chain (RNP 1 position 1) present in 68% of the RRMs [6]. Although the residue conserva- tion at these four positions is strong, these four char- acteristic contacts are not always found all together the [34]. Among the RRM–RNA ⁄ DNA complexes, two RRMs of hnRNPA1 in complex with DNA have all four characteristic contacts, whereas only one to three of structures those are found in the other (Fig. 4). The most frequent ones are the two stacking residue Systematic visual analysis of the conserved residues at the RRM–RNA interface for all 11 published com- plexes led us to define a common structural archetype of the RRM–nucleic acid interaction exemplified by hnRNPA1, an RRM protein binding both DNA and RNA with high affinity. In the structure of hnRNPA1 RRM 2 in complex with DNA [34] (Fig. 3A), two deoxynucleotides, A209 and G210, stack two aromatic rings located on b1 (Phe108, RNP 2 position 2) and b3 (Phe150, RNP 1 position 5) respectively (Fig. 3A). The contacts with these two RNP positions result in a characteristic arrangement of the nucleic acid strand on the b-sheet surface in which the 5¢ end is located on the first half of the b-sheet (b4b1) and the 3¢ end on the second half (b3b2) (Fig. 3B). A third (Phe148, RNP 1 located on b3 aromatic
A
B
C
Fig. 3. hnRNPA1 RRM 2 as a model of single stranded nucleic acid binding [25]. (A) Structure of hnRNPA1 RRM 2 in com- plex with single stranded telomeric DNA and scheme of the b-sheet annotated with the conserved RNP 1 and RNP 2 aromatic residue positions numbered according to each RNP sequence numbering. The con- served aromatic residues are highlighted by green circles [34]. (B) Structural arrange- ment of the DNA strand on the b-sheet of hnRNPA1–RRM 2. (C) Hydrogen bond and van der Waals interaction network confer- ring base-binding specificity (hnRNPA1– RRM 2 complex). This figure was generated with the program MOLMOL [56].
FEBS Journal 272 (2005) 2118–2131 ª 2005 FEBS
2121
C. Maris et al.
The RRM domain, a plastic RNA-binding platform
A
B
C
D
E
F
(A) Nucleolin RRM 2-sNRE complex [28].
Fig. 4. The RRM domain, a highly plastic platform for nucleic acid binding. (B) Sex-lethal RRM 1–polyU–Tra mRNA [31]. (C) Sex-lethal RRM 2–Tra mRNA precursor complex [31]. (D) hnRNPA1 RRM 1–telomeric DNA complex [34]. (E) Poly(A)-binding protein RRM 1–polyadenylate RNA complex [33]. (F) Heterodimeric nuclear cap binding complex 5¢ capped polymerase II transcripts [36]. In all figures, the RNA is shown in yellow and the protein side chain in green. The ribbon of the RRM is shown in grey. The N- and C-terminal extensions of the RRM are shown in green and red, respectively. This figure was generated with the program MOLMOL [56].
structures apart
FEBS Journal 272 (2005) 2118–2131 ª 2005 FEBS
2122
interactions involving RNP 2 position 2 (always pre- sent except in nucleolin RRM 2 [37]) and RNP 1 posi- tion 5 (always present except in CBP 20 [36]). The contacts between the sugars and RNP 1 position 3 are present in five RRM–RNA complexes (CBP20, PABP RRM 1, nucleolin RRM 1 and RRM 2 and sex-lethal RRM 1). The RNP 1 position 1 residue does not necessarily interact with the phosphate between the dinucleotide because in all from hnRNPA1 it contacts an RNA base or a phosphate oxygen of other nucleotides. Also, the RRM inter- actions with the sugar–phosphate backbone are fairly
C. Maris et al.
The RRM domain, a plastic RNA-binding platform
types of RNA-binding limited compared to other proteins, such as ribosomal proteins, suggesting a less important role for this type of interaction [38]. the b-sheet
spread on the b-sheet from b4 to b2 in the 5¢)3¢ direc- tion. More often, the nucleotide at the 5¢ end of the central dinucleotide contacts the loops at the bottom of (loop 1 and loop 3 in particular, Fig. 4C) and the one at the 3¢ end stacks over the pre- vious nucleotide (Fig. 4A). In PAPB RRM 1, it is dif- ferent again; while A6 and A8 stack the protein side chains at the canonical positions on b1 and b3, respect- interacts with ively, the nucleotide in between, A7, loop 3 (Fig. 4E).
Role of the N- and C-terminal regions
This basic binding platform common to all RRMs is not in essence sequence-specific as eight of the 16 dinu- cleotide combinations have already been found: AA [33], AG [34], CG [28], CA [26], GU [31], UC [28], UG (S. D. Auweter and F. H.-T. Allain, unpublished data) and UU [31], with any type of nucleotide either at the 5¢ or the 3¢ position. The nucleotides at these two posi- tions always adopt an anti conformation, except for the G at the 3¢ position always found in a syn confor- mation. Specificity of this central dinucleotide recogni- tion is provided by other non conserved elements of the RRMs. The two most frequently observed elements are the protein side chains at the surface of the b-sheet (RNP 1 position 7 and the two adjacent positions in b1) (Fig. 3A) and the backbone and side chains of the few amino acids just C-terminal to b4. These residues are base-specifically hydrogen-bonded to the RNA or DNA functional groups as illustrated by the multiple in hnRNPA1 RRM 2 base–amino acid contacts (Fig. 3C).
A highly plastic domain to achieve high RNA-binding affinity and specificity
The N- and C-terminal regions of the RRM are often of crucial importance to dramatically enhance the RNA-binding affinity by increasing the protein–RNA interaction network. In most RRM–RNA complexes, the base stacking on the aromatic residue at RNP 2 position 2 is sandwiched either by a protein side chain from the N-terminal region (CBP20) or by one from the C-terminal region of the RRM (Fig. 4D–F) [36]. This side chain can be one residue after the end of b4 as in U1A [26,27] or 16 residues afterwards as in hnRNPA1 RRM 1 [34] (Fig. 4D). The C-terminus of hnRNPA1 RRM 1 is particularly interesting because it is unstructured in the free form and becomes ordered upon DNA binding forming a 310 helix. This structural rearrangement reinforces the concept of binding by induced fit, initially proposed with the structure of the U1A–RNA complex [27]. Side chain residues of this helix, His101 and Arg92, stack over A203 and G204, respectively (Fig. 4D) [34].
The C-terminus can also contribute to differentiating RNA from DNA by interacting with the 2¢OH group of the sugar ring as shown in Fig. 4B,E. The hydroxyl group can act as a hydrogen bond acceptor interacting with protein side chains (Fig. 4E, Arg94; Fig. 4B Arg202) as well as with the backbone amide (Fig. 4B, Gly205) and ⁄ or as a hydrogen bond donor interacting with the carbonyl oxygen of the protein backbone [38]. Other parts of the RRM domain, such as the b2-strand and the loops, also interact with the 2¢OHs and help to discriminate RNA from DNA [26,31,33,35]. Many RRMs bind RNA with high affinity (in the nm range) and high sequence-specificity, in particular all those whose structures have been determined to date. Nevertheless, sequence-specificity does not necessarily imply high affinity, e.g. PTB that specifically recogni- zes pyrimidine tracts but does not provide sufficient binding enthalpy to reach nm affinity (F. C. Ober- strass, S. D. Auweter and F. H.-T. Allain, unpublished data). To achieve higher affinity, some RRM proteins use the two external b4 and b2 strands, while others use the loops 1, 3 or 5, or the C- and N- termini [39]. In many proteins, multiple RRMs associate to bind longer nucleotide stretches. In these cases, the interdo- main linker is an essential component of RNA recogni- tion. In addition, the RNA secondary structure can be an important determinant of the protein binding affin- ity. All of these aspects are presented in detail below.
Role of the two external b-strands and the loops
FEBS Journal 272 (2005) 2118–2131 ª 2005 FEBS
2123
The b-sheet surface of an RRM can be modulated by using only one or up to four b-strands for RNA bind- ing. Figure 4 clearly illustrates that the b-sheet surface is not used to the same extent in each RRM–nucleic acid complex. Exceptionally, in hnRNPA1 RRM 1, each b-strand binds one nucleotide, the DNA being The C-terminal region does not always enhance, but can also inhibit RNA binding as shown in the struc- ture of CBP20 [36] (Fig. 4F). Two residues (Asn116 and Arg123) of the C-terminus form a salt bridge located above the RNP 1 residue at position 5 (Phe85) preventing any RNA binding at this key position. Similarly in PTB, the C-terminal region of all the RRMs hydrophobically interacts with RNP 1 position 5, thereby masking this binding site (F. C. Oberstrass, S. D. Auweter and F. H.-T. Allain, unpublished data).
C. Maris et al.
The RRM domain, a plastic RNA-binding platform
Role of the RNA secondary structure in RRM binding
the secondary structure of
While both U1A and U2B¢ recognize the bases at the top of the stem through numerous hydrogen bonds, nucleolin contacts the nucleolin recognition ele- ment (sNRE) RNA stem essentially by van der Waals interactions [28] (Fig. 5C). The two RRMs of nucleolin sandwich the seven nucleotide loop and RRM 1 and its C-terminal part recognize the unusual loop E struc- ture [28]. The substitution of the loop E by two GC base pairs separated by a bulge increases the dissoci- ation constant more than 100-fold (from 5 nm to 0.8 lm) [30] and, as shown in Fig. 5D, this substitution annihilates all van der Waals interactions (only one hydrogen bond from Lys95 is retained). The double- stranded stem is important for two reasons: first, it restricts the conformation of the RNA loop and redu- ces the entropy loss accompanying protein binding; and second, some structural features of the RNA such as the base pair (U1A and U2B¢) or loop E (nucleolin) that closes the RNA loop, are crucial for positioning the RRM onto the RNA. It was postulated that the RNA structure is essential because it induces conform- ational changes in order to reach the bound state [27,40].
Role of additional RRMs
shown previously,
The combination of two or more RRM domains allows the continuous recognition of a long nucleotide sequence (8–10 nucleotides) often drastically increasing the affinity (Kd < nm). As the b-sheet surface can bind up to four nucleotides and up to six if loops 1 and 3 contribute extensively to binding Some proteins such as the N-terminal RRM of U1A bind single-stranded RNA with high affinity only if the RNA is embedded within a secondary structure, stem loop (hairpin loop II of U1 snRNA [26]) or internal loop (the regulatory element of the U1A 3¢ untranslated region [27]). For example, the U1A protein that recog- nizes a stem loop has a much weaker affinity (104-fold) for a single-stranded 23-mer RNA with no base pairs, even though the proper single-stranded recognition sequence is present [26]. U1A RRM 1 specifically recog- nizes the target RNA through its loops 1 and 3 binding to a specific base pair. In the case of U1A bound to a fragment of U1 snRNA hairpin II, Arg52 (loop 3) makes crucial interactions with the closing loop GC base pair and its substitution to Glu completely abolishes RNA binding [26] (Fig. 5A). U1A not only binds a stem loop but also an internal loop [27,29]. This ability to bind RNA in differ- ent environments shows the adaptability of the proteins to recognize different secondary structures as long as the key protein–RNA interactions are conserved. The closely related U2B¢ RRM binds the same hexanucleo- tide sequence, AUUGCA, as U1A but within a differ- ent stem loop (U2 snRNA hairpin IV) and only when in complex with U2A¢ (Fig. 5B). The adaptability of the RRM domain is further illustrated here, as the key interacts with the RNA stem residue Arg52 still although the closing base pair is a UU base pair in U2snRNA SLIV instead of a GC in U1snRNA SLII.
A
B
C
D
Fig. 5. Role of the RNA secondary structure in RRM binding. (A) U1A spliceosomal protein–U1 snRNA hairpin II complex [26]. (B) U2B¢–U2A¢ protein complex bound to U2 snRNA hairpin IV [32]. (C) Nucleolin–sNRE complex [28]. The loop E motif is composed of a sheared base G5-A18 pair, an A6-U17- G16 and a symmetric (trans-Hoogsteen) locally parallel A7-A15 base pair. (D) Nucleo- lin–b2NRE complex with the loop E motif substituted by a bulge (U15 between two GC base pairs) [30]. The color schemes are the same as in Fig. 4, except that the pro- teins loops and the C-terminus are shown in blue. This figure was generated with the program MOLMOL [56].
FEBS Journal 272 (2005) 2118–2131 ª 2005 FEBS
2124
C. Maris et al.
The RRM domain, a plastic RNA-binding platform
and HuD are splicing of the relative arrangement of the two domains in sex- lethal, HuD and nucleolin, several intra-RNA inter- actions are created upon RNA binding that contribute to the overall enthalpy of the complex, while in PABP almost no intra-RNA interactions are present. On the contrary, hnRNPA1 RRMs 1–2 and PTB RRMs 3–4 (F. C. Oberstrass, S. D. Auweter and F. H.-T. Allain, unpublished results) are arranged in such a way that only distantly located RNA sequences of the same RNA can bind simultaneously to both RRMs. These totally opposite topologies might reflect the opposite function of the various RRM proteins, as both sex- activators, while lethal hnRNPA1 and PTB are splicing repressors [42].
The RRM, also a protein–protein interaction domain
Over the last few years, biochemical and structural studies have shown that the RRM is not only involved in RNA recognition but also in protein–protein inter- action. In addition to structures of multiple RRM- (S. D. Auweter and F. H.-T. Allain, unpublished data). Thus, recognition of a longer single-stranded DNA or RNA requires more than one RRM to form a larger binding platform. Four structures of two con- secutive RRMs in complex with RNA (sex-lethal [31], HuD [35], PABP [33] and nucleolin [28,30]) and one with DNA (hnRNPA1 [34]) have been determined. In all five cases, the two RRMs and the interdomain lin- ker cooperatively bind RNA providing high affinity and specificity. In the free forms of sex-lethal and nucleolin, the linkers are disordered and the two RRM domains tumble independently [37,41]. In some cases (PABP, nucleolin), the interdomain linker (that is the C-terminal region of the N-terminal RRM as described above) acts as a bridge, mediating the cooperative binding of two RRM domains with the RNA. More interesting is the range of new possible conformations provided by the association of two RRMs (Fig. 6). In PAPB, a large binding platform is created for the RNA; in sex-lethal and HuD, the two RRMs form a cleft in which the RNA lies; and in nucleolin the RNA is sandwiched between the RRMs. As a consequence
A
B
Nucleolin
UP1
RRM 2
RRM 2
RRM 1
RRM 1
Sex-lethal
C
5'
5'
3'
RRM 2
D
PABP
5'
3'
RRM 2
RRM 1
3'
U1A
E
RRM 1
5'
3'
5'
3'
Fig. 6. The RRM–RRM interactions. Several protein structures either free or in a com- plex in which two RRM domains interact are shown. Structures of (A) UP1 in the free form [53] (pdb:1 lp1), (B) nucleolin in com- plex with RNA [28] (pdb:1fje), (C) sex-lethal in complex with RNA [31] (pdb:1b7f), (D) PABP in complex with RNA [33] (pdb:1cvj), and (E) U1A homodimer in com- plex with RNA [29] (pdb:1dz5). The RNA backbone is shown in yellow (A–E), the N-terminal RRM domain is displayed green, C-terminal domain blue, and linker region red. (F) One monomer of U1A is displayed green and the other blue. In all cases, important residues for the protein–protein interaction are displayed as balls and sticks. This figure was generated using the pro- grams MOLSCRIPT and RASTER3D [57,58].
FEBS Journal 272 (2005) 2118–2131 ª 2005 FEBS
2125
C. Maris et al.
The RRM domain, a plastic RNA-binding platform
PIE RNA [54]. The structure shows that when bound to RNA, U1A RRM 1 forms a homodimer stabilized by interactions between the two a-helical C-termini (Fig. 6E). On one side the C-terminal a-helix contains charged residues that interact with the RNA and on the opposite side contains hydrophobic residues that constitute the dimer interface. All of
containing proteins as described in the previous sec- tion, structures of RRM domains in complex with var- ious proteins or domains have been solved [32,43–51]. Analysis of these structures shows that protein recogni- tion by RRM domains is very diverse with no general mechanism emerging. For clarity, we distinguish three main classes of RRM–protein interactions: between two RRMs, between an RRM-binding RNA and a non-RRM protein, and finally between RRMs that do not bind RNA and another protein.
Protein interaction involving two RRM domains
these structures clearly show that RRM domains can be involved in RRM–RRM interaction in addition to RNA binding. In most of these complexes, these additional interactions contribute to the forma- tion of a larger RNA-binding interface and are there- fore critical to reach high RNA-binding affinity and specificity. This feature is likely to be frequently found in multiple RRM-containing proteins, especially if the interdomain linker is short.
Protein interaction involving one RRM domain and another domain
respectively,
The first structure showing an interaction between two RRMs is the N-terminal region of hnRNPA1 (UP1) in its free form that contains two RRM domains separ- ated by a short linker [52,53]. The two RRMs form a compact fold and interact with each other via their a-helix 2. The interaction is stabilized by two salt brid- ges connecting two arginines of the first RRM and two aspartic acids of the second (Fig. 6A). This arrangement positions adjacently the b-sheets of both domains forming an extended surface of eight b-strands. Similarly, PTB RRMs 3 and 4, separated by a 24 residue linker region, do not tumble independ- ently in the free form (F. C. Oberstrass, S. D. Auweter and F. H.-T. Allain, unpublished data).
In some cases, it has been demonstrated that RRM- containing proteins can associate with RNA only in the presence of another protein that acts as a cofactor. Both U2B¢ and CBP20 need a cofactor, U2A¢ and CBP80, to recognize RNA. Ternary structures of these complexes have been solved that partially explain the importance of a cofactor in RNA–RRM binding [32,43–45]. U2A¢ consists of five consecutive leucine-rich repeats, and CBP80 of three helical hairpin repeats very similar to the fold of the middle domain of the translation initiation factor 4G (MIF4G) domain. In both cases, the RRM domains of U2B¢ and CBP20 interact with the leucine rich repeat (LRR) motif or the MIF4G domain through their a-helices and loop 4, keeping the b-sheet accessible for RNA-binding (Fig. 7). The interactions, however, are different as they are governed mainly by hydrophobic contacts in the U2B¢–U2A¢ complex, and salt bridges and hydrogen bonds in the CBP20–CBP80 complex. Furthermore, in the case of CBP20, the N- and C-ter- minal extensions flanking the RRM domain become structured only when in complex with both RNA and CBP80. As for RRM–RRM interactions, these RRM– protein interactions contribute to RNA-binding specif- icity, U2A¢ contacting the RNA and CBP80 stabilizing both the N- and C-termini of CBP20 RRM, two key components of CBP20–RNA recognition (Fig. 4) [44]. These RRM–RRM interactions are not a general feature of all RRM proteins. In the case of sex-lethal and nucleolin, in the free proteins, the linker is flexible and the two RRM domains are independent [28,41]. However, upon RNA binding, the two RRM domains adopt a fixed orientation and contact each other. In the nucleolin structure, the RRMs interact via two salt bridges located in the loops (Fig. 6B) and in the struc- ture of hnRNPA1, the RRMs interact by salt bridges located in the a2-helix. Other examples of RNA indu- cing RRM–RRM interactions have also been described in the case of sex-lethal [31], PABP [33], and HuD [35]. In sex-lethal and HuD, the interdomain inter- action is mainly governed by two hydrogen bonds between residues located in b1 and b4 of RRM 1 and in b2 of RRM 2 (Fig. 6C). Furthermore, additional contacts between RRM 2 and the linker region are observed. In the case of PABP, the interdomain inter- actions are mediated through many salt bridges and van der Waals contacts between a2 and b4 of RRM 1 and b2 and a1 of RRM 2, respectively (Fig. 6D).
RRM domains involved only in protein recognition
FEBS Journal 272 (2005) 2118–2131 ª 2005 FEBS
2126
Another interesting example of RRM–RRM inter- action is found in the structure of the N-terminal RRM domain of the U1A protein in complex with the polyadenylation inhibition element (PIE) RNA [29]. In this case, two U1A proteins bind cooperatively to the Some proteins containing RRM domains are involved in protein–protein but not in protein–RNA interactions.
C. Maris et al.
The RRM domain, a plastic RNA-binding platform
A
U2B"-U2A'
Y14-Mago
B
CBP20-CBP80
Fig. 8. The Y14–Magoh complex [48]. Y14 is shown in green, and Magoh is shown in blue. The RNP 1 and 2 of Y14 are shown in red. This figure was generated using the programs MOLSCRIPT and RASTER3D [57,58].
Fig. 7. The RRM–protein–RNA trimolecular complexes. (A) The U2B¢–U2A¢–RNA ternary complex [32]. (B) The CBP20–CBP80–RNA complex [36]. The RNA is shown in yellow, the RRM domain in green, and leucine-rich repeats or MIF4G domains in blue. Resi- dues important for the interaction are displayed as balls and sticks. This figure was generated using the programs MOLSCRIPT and RASTER3D [57,58].
the identity of the amino acids on the surface of the b-sheet (see below [25]).
Y14 and Magoh proteins are part of the exon junc- tion complex that comprises several proteins. Y14 and Magoh form a highly stable complex with nanomolar binding affinity [48]. The C-terminal domain of Y14 has a typical RRM fold and the RNP 1 and RNP 2 amino acid sequences of Y14 are very similar to other RRM domains (Fig. 1). However, Y14 does not bind RNA. Structures of the Y14–Magoh heterodimer show that Y14 binds Magoh through its entire b-sheet [46–48] (Fig. 8). This particular complex formation of the RRM neatly explains why some RRM domains do not have RNA-binding activities. Similarly, in the the UPF2–UPF3 complex involved in structure of non-sense mediated mRNA decay, the b-sheet of the N-terminal RRM domain of UPF3 binds UPF2 [50]. Although the two RRM proteins both interact through their b-sheet, their interacting proteins, Magoh and UPF2, adopt a completely different fold. UPF2 has a totally a-helical MIF4G fold very similar to CBP80, while Magoh has an ab fold (Fig. 8). Also striking is the fact that both UPF2 and CBP80 adopt a MIF4G fold, but recognize RRM in a totally different manner, UPF2 recognizing the RRM b-sheet and CBP80 the RRM a-helices. The structures of
FEBS Journal 272 (2005) 2118–2131 ª 2005 FEBS
2127
Recently, three-dimensional structures of such pro- teins in complex partially explained this unexpected behavior of the RRM domain. Two different situations, however, have been reported. In one case, the protein interaction involves the b-sheet of the RRM domain, thus preventing RNA binding as in the Y14–Magoh complex [46–49] or the UPF2–UPF3 complex [50]. In a second case, the interaction is mediated through the a-helices, leaving the b-sheet solvent-exposed and there- fore theoretically able to bind RNA, as with the U2AF35–U2AF65 [51], and the U2AF65–SF1 complexes [46]. In this latter case, it was postulated that the partic- ular behavior of these RRM domains is due mainly to the splicing factors U2AF35– U2AF65 and U2AF65–SF1 are another example of the diversity encountered in protein–RRM recogni- tion. U2AF65 contains three RRM domains, the two
C. Maris et al.
The RRM domain, a plastic RNA-binding platform
the bound RNA is quite different
of this specificity.
N-terminal domains binding RNA while the C-ter- minal domain mediates SF1 interaction. U2AF35 con- tains a central RRM domain flanked by two zinc finger domains. The structures of U2AF35 RRM in complex with the N-terminal domain of U2AF65 and of the RRM of U2AF65 in complex with the N-ter- minal domain of SF1 have been solved [46,51]. Surprisingly, in this case, the b-sheet of the RRM domain is not implicated in protein interaction as for other non-RNA-binding RRM domains, but involves the two a-helices. Analysis of the RRM fold in these two structures shows striking differences from the canonical RRM domains, mainly consisting of a longer helix a1 (Fig. 2) and the absence of aromatic residues in the RNP 1 and 2 motifs. The authors therefore proposed a novel class of protein recogni- tion motif that they named U2AF homology motif (UHM) [25]. the identified RRM domains are
suggesting
indicate if
The examples described above define a novel class of RRM domains that are involved in protein but not RNA interactions, that RRM domains might have evolved from RNA to protein recognition. Although these RRM proteins do not bind RNA, they are all implicated in RNA-related functions such as recognition of the exon junction (Y14), mRNA decay (UPF3) or pre-mRNA splicing (U2AF35 and U2AF65). This evolutionary process can be accompanied by amino acid substitutions in the RNA-binding regions, namely RNP 1 and 2, as in the proposed for the UHM domain. However, case of Y14 and UPF3, it is not entirely clear why these RRM domains that are very similar to the classical ones favor interaction with proteins rather than RNA.
Conclusion and perspectives
this that shows of RRM extensions, multiplication C-terminal domains or protein cofactors can play an important role in RNA-binding specificity. This review also rai- ses many questions concerning this domain. First, the different concerning RNA binding, analysis of structures shows that although some conserved aro- matic residues are always found at the interface, the topology of in each complex and the sequence-specificity cannot eas- ily be predicted. Thus, more structures of RRM– RNA complexes are needed to fully understand the determinants Second, RRM domains are able to bind RNA with affinities ran- ging from very high to weak, and the structural and the RNA-binding thermodynamic determinants of affinity still need to be elucidated. Third, as it is now demonstrated that some RRM domains are spe- cific to protein recognition rather than RNA binding, which of true RNA-binding domains and which ones are not? In the primary sequence can differentiate some cases, the novel UHM for between these behaviors, as domain, but in other cases, such as Y14 and UPF2, structural determinants other than the amino acid sequence must be present but are still unknown and need to be identified. Fourth, it is established that a high number of proteins contain both RRM and auxiliary domains, such as zinc fingers, also involved in nucleic acid binding. No structural studies, how- these two RNA-binding domains ever, within the same protein influence each other for RNA binding. Finally, it has recently been discov- ered that the RRM domain, for a long time thought to belong exclusively to the eukaryotic world, is also present in bacteria, viruses and mitochondria. From an evolutionary point of view, it would be very interesting to investigate the function of this domain in such organisms and maybe discover their common ancestor. In conclusion, further structural investiga- tions on RRM domains possibly coupled with ther- modynamic and kinetic studies are still needed to confirm present hypotheses and possibly to reveal more surprises.
Acknowledgements
la Recherche (postdoctoral fellowship),
FEBS Journal 272 (2005) 2118–2131 ª 2005 FEBS
2128
The authors would like to acknowledge the financial support of the Fondation Schlumberger pour l’Educa- tion et the Swiss National Science Foundation (Nr. 31–67098.01), the Roche Research Fund for Biology at the ETH Zurich and the SNF NCCR structural biology to FHTA. The RNA recognition motif is an abundant and very diverse protein motif found mainly in eukaryotes. Analysis of the structures of this domain in the free form as well as in complex with both RNA and pro- teins small domain is extremely diverse in terms of both structure and function. We are now just starting to understand the structural, functional, as well as evolutionary aspects of this domain. It is now clear that the original perception of the RRM as a simple rigid RNA-binding domain must evolve and that further biochemical and struc- tural studies are needed to obtain a full picture of its role in the cell. Structures of RRM domains in complex with different RNAs show that this small compact domain is a central component of RNA the only determinant. N- and recognition but not
C. Maris et al.
The RRM domain, a plastic RNA-binding platform
References
14 Lin KT, Lu RM & Tarn WY (2004) The WW domain- containing proteins interact with the early spliceosome and participate in pre-mRNA splicing in vivo. Mol Cell Biol 24, 9176–9185.
15 Schuster G & Gruissem W (1991) Chloroplast mRNA-
1 Dreyfuss G, Swanson MS & Pinol-Roma S (1988) Het- erogeneous nuclear ribonucleoprotein particles and the pathway of mRNA formation. Trends Biochem Sci 13, 86–91.
2 Adam SA, Nakagawa T, Swanson MS, Woodruff TK
3¢ end processing requires a nuclear-encoded RNA-bind- ing protein. EMBO J 10, 1493–1502.
& Dreyfuss G (1986) mRNA polyadenylate-binding pro- tein: gene isolation and sequencing and identification of a ribonucleoprotein consensus sequence. Mol Cell Biol 6, 2932–2943.
16 Vermel M, Guermann B, Delage L, Grienenberger JM, Marechal-Drouard L & Gualberto JM (2002) A family of RRM-type RNA-binding proteins specific to plant mitochondria. Proc Natl Acad Sci USA 99, 5866–5871.
17 Nagai K, Oubridge C, Jessen TH, Li J & Evans PR
(1990) Crystal structure of the RNA-binding domain of the U1 small nuclear ribonucleoprotein A. Nature 348, 515–520.
3 Swanson MS, Nakagawa TY, LeVan K & Dreyfuss G (1987) Primary structure of human nuclear ribonucleo- protein particle C proteins: conservation of sequence and domain structures in heterogeneous nuclear RNA, mRNA, and pre-rRNA-binding proteins. Mol Cell Biol 7, 1731–1739.
4 Bandziulis RJ, Swanson MS & Dreyfuss G (1989)
18 Liker E, Fernandez E, Izaurralde E & Conti E (2000) The structure of the mRNA export factor TAP reveals a cis arrangement of a non-canonical RNP domain and an LRR domain. EMBO J 19, 5587–5598.
RNA-binding proteins as developmental regulators. Genes Dev 3, 431–437.
19 Perez-Alvarado GC, Martinez-Yamout M, Allen MM,
5 Kenan DJ, Query CC & Keene JD (1991) RNA recog- nition: towards identifying determinants of specificity. Trends Biochem Sci 16, 214–220.
6 Birney E, Kumar S & Krainer AR (1993) Analysis of
Grosschedl R, Dyson HJ & Wright PE (2003) Structure of the nuclear factor ALY: insights into post-transcrip- tional regulatory and mRNA nuclear export processes. Biochemistry 42, 7348–7357.
the RNA-recognition motif and RS and RGG domains: conservation in metazoan pre-mRNA splicing factors. Nucleic Acids Res 21, 5803–5816.
20 Jacks A, Babon J, Kelly G, Manolaridis I, Cary PD, Curry S & Conte MR (2003) Structure of the C-term- inal domain of human La protein reveals a novel RNA recognition motif coupled to a helical nuclear retention element. Structure (Camb) 11, 833–843.
7 Maruyama K, Sato N & Ohta N (1999) Conservation of structure and cold-regulation of RNA-binding pro- teins in cyanobacteria: probable convergent evolution with eukaryotic glycine-rich RNA-binding proteins. Nucleic Acids Res 27, 2029–2036.
21 Avis JM, Allain FH, Howe PW, Varani G, Nagai K & Neuhaus D (1996) Solution structure of the N-terminal RNP domain of U1A protein: the role of C-terminal residues in structure stability and RNA binding. J Mol Biol 257, 398–411.
8 Bateman A, Birney E, Cerruti L, Durbin R, Etwiller L, Eddy SR, Griffiths-Jones S, Howe KL, Marshall M & Sonnhammer EL (2002) The Pfam protein families data- base. Nucleic Acids Res 30, 276–280.
9 Hudson BP, Martinez-Yamout MA, Dyson HJ &
22 Perez Canadillas JM & Varani G (2003) Recognition of GU-rich polyadenylation regulatory elements by human CstF-64 protein. EMBO J 22, 2821–2830.
Wright PE (2004) Recognition of the mRNA AU-rich element by the zinc finger domain of TIS11d. Nat Struct Mol Biol 11, 257–264.
10 De Guzman RN, Wu ZR, Stalling CC, Pappalardo L,
23 Conte MR, Grune T, Ghuman J, Kelly G, Ladas A, Matthews S & Curry S (2000) Structure of tandem RNA recognition motifs from polypyrimidine tract binding protein reveals novel features of the RRM fold. EMBO J 19, 3132–3141.
Borer PN & Summers MF (1998) Structure of the HIV- 1 nucleocapsid protein bound to the SL3 psi-RNA recognition element. Science 279, 384–388.
11 Sudol M, Sliwa K & Russo T (2001) Functions of WW
domains in the nucleus. FEBS Lett 490, 190–195.
24 Simpson PJ, Monie TP, Szendroi A, Davydova N, Tyz- ack JK, Conte MR, Read CM, Cary PD, Svergun DI, Konarev PV, Curry S & Matthews S (2004) Structure and RNA interactions of the N-terminal RRM domains of PTB. Structure (Camb) 12, 1631–1643.
25 Kielkopf CL, Lucke S & Green MR (2004) U2AF homology motifs: protein recognition in the RRM world. Genes Dev 18, 1513–1526.
12 Roy G, De Crescenzo G, Khaleghpour K, Kahvejian A, O’Connor-McCourt M & Sonenberg N (2002) Paip1 interacts with poly (A) binding protein through two independent binding motifs. Mol Cell Biol 22, 3769– 3782.
13 Kozlov G, Trempe JF, Khaleghpour K, Kahvejian A, Ekiel I & Gehring K (2001) Structure and function of the C-terminal PABC domain of human poly (A) -bind- ing protein. Proc Natl Acad Sci USA 98, 4409–4413.
26 Oubridge C, Ito N, Evans PR, Teo CH & Nagai K (1994) Crystal structure at 1.92 A resolution of the RNA-binding domain of the U1A spliceosomal pro- tein complexed with an RNA hairpin. Nature 372, 432–438.
FEBS Journal 272 (2005) 2118–2131 ª 2005 FEBS
2129
C. Maris et al.
The RRM domain, a plastic RNA-binding platform
of the RNA recognition motifs of Sex-lethal. Proc Natl Acad Sci USA 96, 4892–4897.
27 Allain FH, Gubser CC, Howe PW, Nagai K, Neuhaus D & Varani G (1996) Specificity of ribonucleoprotein interaction determined by RNA folding during complex formulation. Nature 380, 646–650.
42 Grabowski PJ & Black DL (2001) Alternative RNA Splicing in the nervous system. Prog Neurobiol 65, 289–308.
43 Mazza C, Ohno M, Segref A, Mattaj IW & Cusack S
(2001) Crystal structure of the human nuclear cap bind- ing complex. Mol Cell 8, 383–396.
44 Mazza C, Segref A, Mattaj IW & Cusack S (2002)
28 Allain FH, Bouvet P, Dieckmann T & Feigon J (2000) Molecular basis of sequence-specific recognition of pre- ribosomal RNA by nucleolin. EMBO J 19, 6870–6881. 29 Varani L, Gunderson SI, Mattaj IW, Kay LE, Neuhaus D & Varani G (2000) The NMR structure of the 38 kDa U1A protein – PIE RNA complex reveals the basis of cooperativity in regulation of polyadenylation by human U1A protein. Nat Struct Biol 7, 329–335. 30 Johansson C, Finger LD, Trantirek L, Mueller TD,
Co-crystallization of the human nuclear cap-binding complex with a m7GpppG cap analogue using protein engineering. Acta Crystallogr D Biol Crystallogr 58, 2194–2197.
45 Calero G, Wilson KF, Ly T, Rios-Steiner JL, Clardy
Kim S, Laird-Offringa IA & Feigon J (2004) Solution structure of the complex formed by the two N-terminal RNA-binding domains of nucleolin and a pre-rRNA target. J Mol Biol 337, 799–816.
JC & Cerione RA (2002) Structural basis of m7GpppG binding to the nuclear cap-binding protein complex. Nat Struct Biol 9, 912–917.
46 Selenko P, Gregorovic G, Sprangers R, Stier G,
31 Handa N, Nureki O, Kurimoto K, Kim I, Sakamoto H, Shimura Y, Muto Y & Yokoyama S (1999) Structural basis for recognition of the tra mRNA precursor by the Sex-lethal protein. Nature 398, 579–585.
Rhani Z, Kramer A & Sattler M (2003) Structural basis for the molecular recognition between human splicing factors U2AF65 and SF1 ⁄ mBBP. Mol Cell 11, 965–976.
32 Price SR, Evans PR & Nagai K (1998) Crystal structure of the spliceosomal U2B¢-U2A¢ protein complex bound to a fragment of U2 small nuclear RNA. Nature 394, 645–650.
47 Fribourg S, Gatfield D, Izaurralde E & Conti E (2003) A novel mode of RBD-protein recognition in the Y14-Mago complex. Nat Struct Biol 10, 433– 439.
33 Deo RC, Bonanno JB, Sonenberg N & Burley SK (1999) Recognition of polyadenylate RNA by the poly(A)-binding protein. Cell 98, 835–845.
48 Lau CK, Diem MD, Dreyfuss G & Van Duyne GD (2003) Structure of the Y14-Magoh core of the exon junction complex. Curr Biol 13, 933–941.
34 Ding J, Hayashi MK, Zhang Y, Manche L, Krainer AR & Xu RM (1999) Crystal structure of the two- RRM domain of hnRNP A1 (UP1) complexed with single-stranded telomeric DNA. Genes Dev 13, 1102– 1115.
49 Bono F, Ebert J, Unterholzner L, Guttler T, Izaurralde E & Conti E (2004) Molecular insights into the interac- tion of PYM with the Mago-Y14 core of the exon junc- tion complex. EMBO Report 5, 304–310.
35 Wang X & Tanaka Hall TM (2001) Structural basis for recognition of AU-rich element RNA by the HuD pro- tein. Nat Struct Biol 8, 141–145.
36 Mazza C, Segref A, Mattaj IW & Cusack S (2002)
50 Kadlec J, Izaurralde E & Cusack S (2004) The struc- tural basis for the interaction between nonsense- mediated mRNA decay factors UPF2 and UPF3. Nat Struct Mol Biol 11, 330–337.
Large-scale induced fit recognition of an m (7) GpppG cap analogue by the human nuclear cap-binding com- plex. EMBO J 21, 5548–5557.
37 Allain FH, Gilbert DE, Bouvet P & Feigon J (2000)
51 Kielkopf CL, Rodionova NA, Green MR & Burley SK (2001) A novel peptide recognition mode revealed by the X-ray structure of a core U2AF35 ⁄ U2AF65 hetero- dimer. Cell 106, 595–605.
Solution structure of the two N-terminal RNA-binding domains of nucleolin and NMR study of the interaction with its RNA target. J Mol Biol 303, 227–241.
52 Xu RM, Jokhan L, Cheng X, Mayeda A & Krainer AR (1997) Crystal structure of human UP1, the domain of hnRNP A1 that contains two RNA-recognition motifs. Structure 5, 559–570.
38 Allers J & Shamoo Y (2001) Structure-based analysis of protein–RNA interactions using the program ENTAN- GLE. J Mol Biol 311, 75–86.
53 Shamoo Y, Krueger U, Rice LM, Williams KR & Steitz TA (1997) Crystal structure of the two RNA binding domains of human hnRNP A1 at 1.75 A˚ resolution. Nat Struct Biol 4, 215–222.
54 van Gelder CW, Gunderson SI, Jansen EJ, Boelens
39 Varani G & Nagai K (1998) RNA recognition by RNP proteins during RNA processing. Annu Rev Biophys Biomol Struct 27, 407–445.
40 Showalter SA & Hall KB (2004) Altering the RNA- binding mode of the U1A RBD1 protein. J Mol Biol 335, 465–480.
41 Crowder SM, Kanaar R, Rio DC & Alber T (1999)
Absence of interdomain contacts in the crystal structure
WC, Polycarpou-Schwarz M, Mattaj IW & van Ven- rooij WJ (1993) A complex secondary structure in U1A pre-mRNA that binds two molecules of U1A protein is required for regulation of polyadenylation. EMBO J 12, 5191–5200.
FEBS Journal 272 (2005) 2118–2131 ª 2005 FEBS
2130
C. Maris et al.
The RRM domain, a plastic RNA-binding platform
57 Kraulis PJ (1991) MOLSCRIPT: a program to produce both detailled and schematic plots of protein structures. J Appl Crystallogr 24, 946–950.
55 Thompson JD, Higgins DG & Gibson TJ (1994) CLUS- TAL W: improving the sensitivity of progressive multi- ple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res 22, 4673–4680.
58 Merritt EA & Murphy MEP (1994) Raster3d, Version 2.0: A program for photorealistic molecular graphics. Acta Crystallogr D Biol Crystallogr 50, 869–873.
56 Koradi R, Billeter M & Wuthrich K (1996) MOLMOL: a program for display and analysis of macromolecular structures. J Mol Graph 51–5, 29–32.
FEBS Journal 272 (2005) 2118–2131 ª 2005 FEBS
2131