MINIREVIEW
Alternative splicing: global insights
Martina Hallegger*, Miriam Llorian* and Christopher W. J. Smith
Department of Biochemistry, University of Cambridge, UK
Introduction
Alternative splicing allows individual genes to produce
two or more variant mRNAs, which in many cases
encode functionally distinct proteins. With the progres-
sive generation of ever larger sequence datasets, the
proportion of multi-exon human genes that are known
to be alternatively spliced has expanded to 92–94%, of
which 85% have a minor isoform frequency of at least
15% [1,2]. Despite some debate about the extent to
which all of this alternative splicing is functionally
important [3], there is no disputing that alternative
splicing is a major contributor to the diverse repertoire
of transcriptomes and proteomes. Its importance is
underscored by the fact that misregulated alternative
splicing can lead to human disease [4,5]. As part of the
overarching effort to understand how the information
encrypted within genomes is used to generate fully
functional organisms, it is therefore necessary to deci-
pher the ‘RNA codes’ underlying regulated patterns of
alternative splicing.
Traditionally, research on alternative splicing regula-
tion focused on the study of minigene models in vitro
or in vivo. The picture that emerged is that regulation
of alternative splicing occurs via the action of numer-
ous RNA binding proteins expressed at variable levels
between tissues. These activators and repressors often
mediate their effects by binding to enhancer and silen-
cer elements within or surrounding alternatively spliced
exons (reviewed in [6]). Although much progress has
been made using model systems, a drawback is that
even when a model alternative splicing event has been
thoroughly characterized it is not immediately obvi-
ous which of its features are generally shared by
Keywords
alternative splicing; microarray; RNA-Seq
Correspondence
C. W. J. Smith, Department of
Biochemistry, University of Cambridge, 80
Tennis Court Road, Cambridge CB2 1GA,
UK
Fax: +44 1223 766002
Tel: +44 1223 333655
E-mail: cwjs1@cam.ac.uk
*These authors contributed equally to this
work
(Received 26 August 2009, accepted
22 October 2009)
doi:10.1111/j.1742-4658.2009.07521.x
Following the original reports of pre-mRNA splicing in 1977, it was
quickly realized that splicing together of different combinations of splice
sites alternative splicing– allows individual genes to generate more than
one mRNA isoform. The full extent of alternative splicing only began to
be revealed once large-scale genome and transcriptome sequencing projects
began, rapidly revealing that alternative splicing is the rule rather than the
exception. Recent technical innovations have facilitated the investigation of
alternative splicing at a global scale. Splice-sensitive microarray platforms
and deep sequencing allow quantitative profiling of very large numbers of
alternative splicing events, whereas global analysis of the targets of RNA
binding proteins reveals the regulatory networks involved in post-transcrip-
tional gene control. Combined with sophisticated computational analysis,
these new approaches are beginning to reveal the so-called ‘RNA code’
that underlies tissue and developmentally regulated alternative splicing, and
that can be disrupted by disease-causing mutations.
Abbreviations
CLIP, UV cross-linking and immunoprecipitation; CELF, CUGBP and ETR3 like family (of RNA binding proteins); CUGBP, CUG binding
protein; miRNA, micro-RNA; RNP, ribonucleoprotein; MBNL, muscleblind like; PTB, polypyrimidine tract binding protein; SELEX, selective
evolution of ligands by exponential enrichment; SR protein, serine-arginine rich protein.
856 FEBS Journal 277 (2010) 856–866 ª2010 The Authors Journal compilation ª2010 FEBS
coregulated alternative splicing events as part of a
common regulatory programme, and which features
are oddities of the particular model system. Over
recent years new high-throughput methodologies have
allowed the analysis of thousands of alternative splic-
ing events in parallel. These tools principally splice-
sensitive microarrays, but also medium-throughput
automated RT-PCR, and increasingly deep sequencing
allow large-scale quantitative profiling of splice vari-
ants. This is important in allowing the generation of
large datasets of coregulated splicing events a prere-
quisite for defining RNA codes. Biomedically, these
approaches can facilitate the identification of splicing
signatures that are associated with pathologies [7]. At
the same time, improved methods for defining the full
cellular complement of RNAs to which a particular
protein binds for example, CLIP (UV cross-linking
and immunoprecipitation [8]) and its ‘next generation’
derivative HITS-CLIP [9] or CLIP-Seq [10] as well
as a global analysis of alternative splicing changes pro-
duced as a result of splicing factor knockdown or
knockout, provide additional ‘factor-centric’ datasets
that can contribute to defining the codes.
Several recent reviews have covered different aspects
of these global analyses [11–15]. The aim of this mini-
review is to highlight some of the recently published
information that contributes towards breaking the
RNA code by the application of high-throughput
methodology, mainly focusing upon work in mamma-
lian systems. We start by providing a brief review of
the enabling technologies, and move on to discuss the
insights they have allowed and possible future develop-
ments.
Analogue and digital transcriptome
profiling
Early microarrays typically contained probes consisting
of full-length cDNAs or oligonucleotide probes located
towards the 3¢end of transcripts, and were unable to
distinguish alternatively spliced isoforms. However, a
number of current array designs, in different ‘flavours’
depending on the location of the probes, can distin-
guish between splice variants (Fig. 1A, Table 1): (a) til-
ing arrays, with overlapping probes across a known
genomic sequence (a chromosome or an entire gen-
ome) [16]; (b) exon-body arrays, in which probes are
located within exons. For example, the Affymetrix
human ExonArray includes 1.4 million probe sets cor-
responding to all known human exons, ranging from
the well annotated to more speculative computational
predictions [17–20]; (c) splice-junction arrays, which
contain probes crossing spliced junctions [21]; or
(d) exon-junction arrays, which contain probes within
exons as well as across exon junctions. Among the
exon-junction arrays that have been used successfully
are human and mouse arrays interrogating 3100 and
3700 cassette exons, respectively [22,23]. A similar
design has been used to interrogate 8315 alternative
splicing events in Drosophila [24–26]. Finally, a ‘whole
transcript’ microarray monitoring 203 672 exons and
178 351 exon junctions has allowed the identification
of more than 24 000 human alternative splicing events
[27]. Such arrays have been applied successfully to
study changes in alternative splicing under different
conditions ranging from tissue-specific changes
[17,27,28], cancer-associated splicing [19,29], signal-
activated splicing [26,30], developmentally regulated
splicing [20,31], as well as to define functional targets
by splicing factor depletion [18,25,32–34] and alterna-
tive splicing events linked to nonsense-mediated decay
[35]. Although splice-sensitive microarrays have been
applied with great success (see Table 1), they have
some limitations, including cross-hybridization prob-
lems, limited dynamic range, as well as a low signal-to-
noise ratio due to background. In particular, many of
the normal rules for optimal probe design have to be
relaxed or ignored in the case of exon-junction probes.
Finally, arrays are not an ideal platform for discover-
ing new alternative splicing events, including, for
example, inclusion of pseudo-exons (see accompanying
review by Dhir and Buratti [36]), and they are limited
to organisms with sequenced genomes.
Sequence-based methods, including small tags, such
as expressed sequence tags, cap analysis of gene
expression [37], serial analysis of gene expression [38],
as well as full-length cDNAs [39,40], have been used to
obtain digital counts of transcript abundance, but they
have suffered from bias introduced in the sample prep-
aration, inability to detect lowly expressed genes and
low statistical power. The development of high-
throughput DNA sequencing technologies [10,41,42]
circumvents many of these previous barriers [1,43–48].
RNA-Seq has the capacity to generate millions of
short sequence reads (25–30 or 200–400 nucleotides
depending on the sequencing technology) of cDNAs
derived from polyA-enriched mRNA [45]. Reads are
then mapped on to unique locations on the genome
and annotated transcriptome (for splice-junction
reads), providing a digital count of expressed
sequences (exons). Differences in read densities across
genes in different conditions allow for quantification of
gene expression [2,43]. Comparison with microarray or
RT-PCR data shows that read counts give an accurate
estimate of relative gene expression levels across a very
broad dynamic range [1,2].
M. Hallegger et al. Alternative splicing: global insights
FEBS Journal 277 (2010) 856–866 ª2010 The Authors Journal compilation ª2010 FEBS 857
Because many sequence reads span exon–exon junc-
tions, RNA-Seq can identify novel splicing events. The
discovery of new alternative splicing events and
mRNA isoforms is an area where the new sequencing
technologies will have an immediate impact. However,
a greater challenge is to harness RNA-Seq for digital
quantitative profiling of alternative splicing (Fig. 1B).
In principle, changes in alternative splicing between
two conditions can be quantitated by comparing the
number of reads mapping to reciprocal events (e.g.
exon inclusion versus skipping) [2], or by normalizing
the number of reads mapping to a particular splice
junction or exon by the number of reads across the
gene. In practice, large-amplitude changes in alterna-
tive splicing events within genes that are themselves
highly expressed are readily detected (e.g. the ‘switch-
like’ events reported in [2]). Only in one-third of
105 000 annotated alternative splicing events were
reciprocal reads detected by Wang et al. [2], allowing
quantification of tissue-specific differential splicing
using a minimum threshold of 10% change in inclu-
sion ratio between tissues. However, more subtle
changes in alternative splicing within genes for which
few reads are available will evade detection [49].
Recent estimates suggest that 200 million reads would
be required to quantitate accurately the splicing levels
in 80% of genes [15]. In the future, the progressively
decreasing cost and increasing read lengths and volume
of high-throughput sequencing can only advance the
ability of RNA-Seq to profile alternative splicing quan-
titatively. Methods to ‘focus’ sequence reads on to
splice junctions, such as RNA-mediated annealing,
A
B
Fig. 1. High-throughput methods for global analyses of alternative splicing. (A) Schematic representation of different splice-sensitive micro-
arrays (adapted from [27]). Exon arrays, typically Affymetrix Exon Arrays, contain oligonucleotide probe sets for every known and predicted
exon. Junction arrays, typically used in [21], contain probes spanning exon junctions across annotated genes. Exon-junction arrays typically
contain both exon-body and exon-junction probes. The coverage of these arrays varies from a few thousand cassette exons [22,23] to all
annotated alternatively spliced genes in Drosophila [24–26] or every single annotated exon and exon junction in 18 000 human genes [27].
The bottom panel shows an example of differential exon usage for a typical cassette exon by means of the differential hybridization signals.
(B) RNA-Seq. The genomic structure for a typical cassette exon is depicted in the middle of the panel, where constitutive exons are shown
in purple and the alternative cassette exon in blue. Sequence reads obtained from the high-throughput method are represented in colour-
coded rectangles (see inset) and are mapped within the genomic sequence. The counting of reads corresponding to inclusion (upper) and
skipping (bottom) allows for the estimation of ‘inclusion ratios’ for the different alternatively spliced isoforms.
Alternative splicing: global insights M. Hallegger et al.
858 FEBS Journal 277 (2010) 856–866 ª2010 The Authors Journal compilation ª2010 FEBS
selection, extension and ligation [50] or preselection by
customized capture arrays [51], might enable more
cost-effective quantitative profiling of a large number
of alternative splicing events. In the meantime, some
of the splice-sensitive microarray platforms will remain
competitive.
Surveying splicing regulator targets
Cataloguing the targets of RNA binding proteins that
are known splicing regulators provides a complemen-
tary entry point for unravelling RNA codes. ‘Func-
tional targets’ can be classified as the set of alternative
splicing events that are affected by perturbing the
levels of a splicing regulator, by knockdown, knockout
or overexpression. These targets can be identified by
global transcriptome profiling tools, such as splice-
sensitive microarrays [18,25,32–34], medium-through-
put RT-PCR [52], RNA-Seq or even quantitative
proteomics [53]. However, apparent functional targets
can include indirect secondary targets.
A complementary approach is to identify direct
RNA ‘binding targets’. Selective evolution of ligands
by exponential enrichment (SELEX) is an initial fully
in vitro approach that defines the optimal binding site,
typically short variably degenerate motifs, for an RNA
binding protein by iterative selection from an ini-
tially fully degenerate sequence pool [54]. A variant
approach, genomic SELEX, uses RNA transcribed
from genomic DNA as the starting pool for selection
[55]. SELEX is a useful, although not obligatory,
precursor to methods that catalogue the actual RNA
species (mRNA or pre-mRNA) bound by a splicing
regulatory protein. Direct immunoprecipitation with-
out prior cross-linking (RNP immunoprecipitation)
followed by hybridization to arrays can be a useful
approach [25]. However, a more powerful approach
for identifying binding targets is CLIP (Fig. 2), which
was originally developed to identify targets of the
neuron-specific NOVA proteins [8,56]. RNA is first
cross-linked in vivo to bound protein by UV irradia-
tion, fragmented to 100 nucleotide tags, isolated by
immunoprecipitation, reverse transcribed and then
sequenced. A key feature of CLIP is that UV induces
‘zero-length’ cross-links only between RNA and
directly bound proteins, thereby allowing enrichment
Table 1. Summary of splice-sensitive microarray analyses.
Array design Experiment Species
Validation rate
(events tested) Reference
203 672 exons 178 351 exon junctions 48 tissues and cell lines Human 74% (23 events tested) [27]
110 367 exons 93 382 exon junctions Time course of heart development Mouse Not mentioned [31]
125 000 junction probes 52 tissues and cell lines Human 58% [21]
40 443 exon-junction probe sets Nova-2 knockout brains Mouse 100% (49 49) [72]
Affymetrix Exon Array Probe
sets for 1 million exons
Colon, bladder, prostate
cancer tissues
Human 66.67% (10 15) [29]
11 human tissues Human 86% [17]
Colon cancer Human 33% [19]
Lymphoblastoid cell lines Human 78% (25 32) [16]
hnRNPLL knockdown in T cells Human Not mentioned [18]
Mid-fetal brain Mouse 95% (65 68) [28]
hnRNPL knockdown Human 22% (11 50) [32]
PTB knockdown in N2A cells Mouse 27 30 [34]
Exon Array and array featuring
exon-body and exon-junction
probe sets
Erythropoiesis Human 6 events validated [20]
3126 cassette exons 10 adult mouse tissues Mouse [23]
3707 cassette exons 27 tissues and cell lines Mouse Not mentioned [22]
> 5000 cassette exons Activation of Jurkat cells Human 68% (17 25) [30]
3055 cassette exons Knockdown of UPF1, UPF2,
UPF3 in HeLa
Human 83% [35]
1300 exons Knockdown of Sam68 Mouse 68.5% (24 35) [33]
8315 mRNAs 9868 alt
junction probes
Knockdown of SR and hnRNP
proteins in S2 cells
Drosophila 100% (6 6) [24]
Knockdown of hnRNP proteins
in S2 cells
Drosophila 70% [25]
Alternative splicing changes upon
insulin or Wingless stimulation
Drosophila 70% (11 15) [26]
M. Hallegger et al. Alternative splicing: global insights
FEBS Journal 277 (2010) 856–866 ª2010 The Authors Journal compilation ª2010 FEBS 859
of specifically bound sequences by immunoprecipita-
tion under stringent conditions. The original CLIP
procedure has now been modified, with direct high-
throughput sequencing of reverse transcribed tags
[9,10]. The so-called HITS-CLIP [9] or CLIP-Seq [10]
protocols allow saturated coverage of binding targets,
giving a truly global view of the RNP landscape of
individual proteins, and suggesting possible novel func-
tions. This ‘next generation’ CLIP approach has
already been applied to the splicing regulators NOVA
[57], FOX2 [58], SFRS1 (better known as SF2 ASF)
[59,60], as well as the miRNA-associated protein,
argonaute [61]. The comprehensive view afforded by
this approach reveals additional, nonsplicing-related,
roles for these RNA binding proteins. For example,
a surprising new function for NOVA2 in alternative
poly(A)-site choice was discovered. Neuronal cells in
general tend to process at promoter-distal poly(A)-sites
and the NOVA2 targets follow this trend. Proliferating
cells produce shorter 3¢UTRs and therefore reduce
the potential of miRNA regulation [62]. By the
same token, neuronal transcripts with long UTRs are
potentially more prone to regulatory inputs from both
miRNAs and 3¢UTR binding proteins.
In practice, methods to define functional and
binding targets are complementary. A comprehensive
global analysis of the Drosophila homologues of the
mammalian hnRNPA B proteins, hrp36, hrp38, hrp40,
hrp48, involved analysis by a splice-sensitive array of
alterations in alternative splicing upon knockdown,
determination of SELEX motifs in vitro and direct
immunoprecipitation without prior cross-linking
followed by hybridization to arrays using a whole
genome tiling array [25]. This provided many insights
into the functional redundancy and specialization of
this family, and provided hints about their probable
mechanism of action. Perhaps most surprisingly, in
view of popular models about antagonism between the
two families of proteins, very few alternative splicing
events were found to be regulated by both hnRNP and
SR proteins [24,25].
Tissue and individual variations in
alternative splicing
Over the last year, several reports have focussed on
the global analysis of transcript isoform differences
between human tissues [1,2,16,27,28,47,63,64], mouse
tissues [31,63], normal and cancer tissues [64], in
response to specific signalling pathways in Drosophila
[26], or developmental transitions in human brain [28],
mouse heart [31] and mouse stem cells [63]. The combi-
nation of these approaches has revealed extensive
transcript complexity.
Sequencing approaches show that many transcripts
extend beyond the previously annotated 5¢and 3¢gene
Fig. 2. HITS-CLIP. Intact tissue or tissue culture cells are UV irradiated to induce covalent cross-links between RNA and RNA binding pro-
teins. Cells are lysed under very stringent conditions and treated with DNAse and partially digested with RNAses. The RNA–RNP complex is
pulled-down by immunoprecipitation. The RNA is radioactively 5¢labelled and ligated to a 5¢RNA linker. The sample is run on SDS PAGE
with neutral pH and blotted. Only RNA cross-linked to protein will be transferred on to the membrane. A small fragment of membrane is iso-
lated at a position that corresponds to the protein plus RNA between 50 and 100 nucleotides. After proteinase K digestion, the RNA is
recovered from the membrane and ligated on its 3¢end to an RNA adapter with complementarity to the RT primer. The following PCR step
with primer complementary to ligated linkers also allows the addition of appropriate HITS-specific primer sequences (adapted from [76]).
Alternative splicing: global insights M. Hallegger et al.
860 FEBS Journal 277 (2010) 856–866 ª2010 The Authors Journal compilation ª2010 FEBS