REVIEW ARTICLE
Expressed protein ligation
Method and applications
Ralf David, Michael P.O. Richter and Annette G. Beck-Sickinger
Institute of Biochemistry, Faculty of Biosciences, Pharmacy and Psychology, University of Leipzig, Germany
The introduction of noncanonical amino acids and bio-
physical probes into peptides and proteins, and total or
segmental isotopic labelling has the potential to greatly aid
the determination of protein structure, function and protein–
protein interactions. To obtain a peptide as large as possible
by solid-phase peptide synthesis, native chemical ligation
was introduced to enable synthesis of proteins of up to 120
amino acids in length. After the discovery of inteins, with
their self-splicing properties and their application in protein
synthesis, the semisynthetic methodology, expressed protein
ligation, was developed to circumvent size limitation prob-
lems. Today, diverse expression vectors are available that
allow the production of N- and C-terminal fragments that
are needed for ligation to produce large amounts and high
purity protein(s) (protein a-thioesters and peptides or pro-
teins with N-terminal Cys). Unfortunately, expressed pro-
tein ligation is still limited mainly by the requirement of a Cys
residue. Of course, additional Cys residues can be introduced
into the sequence by site directed mutagenesis or synthesis,
however, those mutations may disturb protein structure
and function. Recently, alternative ligation approaches have
been developed that do not require Cys residues. Accord-
ingly, it is theoretically possible to obtain each modified
protein using ligation strategies.
Keywords: expressed protein ligation; IMPACT
TM
-system;
intein; native chemical ligation.
Introduction
Proteins and peptides that have been modified by intro-
ducing noncanonical amino acids, fluorescence tags, spin
resonance labels or cross-linking agents have great potential
for investigations into protein–protein interactions and can
help to elucidate protein structures. Furthermore, artificial
peptides and proteins with new properties and with a broad
range of applications can be obtained. Further interest lies
in fragmental or complete isotopic labelling for NMR
studies to determine protein structures.
Solid-phase peptide synthesis (SPPS) provides the pos-
sibility of introducing noncanonical amino acids into
peptides but is restricted to peptides of up to 60 amino
acids in length. By using expression systems in bacteria or
yeast, the recombinant generation of peptides and proteins
and their complete isotopic labelling has become possible
[1–3]. The size of the constructs is not restricted but the
insertion of noncanonical amino acids is difficult [4,5]. The
limitation of peptide size in SPPS was circumvented by
several approaches developed for the synthesis of proteins
by segment condensation [6]. Liu et al. used a glycolalde-
hyde peptide ester for the reaction of an unmasked aldehyde
with an amino-group of an N-terminal Cys or Ser to form
a thiazolidine- or oxazolidine-ring. Rearrangement of the
O-acyl-ester resulted in an amide bond with a pseudoproline
residue [7]. In the thiol capture approach, where only Cys
sidechains have to be protected, a 4-mercapto-dibenzofuran
ester forms an asymmetric disulfide bond with an
N-terminal Cys activated with an S-(methoxycarbonyl)sul-
fenyl (Scm) group of a second peptide. The free amino
function of this amino acid can attack the carbonyl group of
theesterandanON-acyl transfer results in an amide-
bond. Reductive cleavage of the disulfide releases the free
Cys sidechain [8]. CNBr-cleavage fragments refold and
form noncovalent complexes and finally the missing peptide
bonds are reattached [9]. Cytochrome cCNBr fragments
1–65 and 66–104 were modified and religated by this
method [10], but this technique is limited by the occurrence
of Met at the cleavage site.
Dawson et al. introduced a simple and elegant method
called native chemical ligation (NCL) for the synthesis of
peptides by condensation of their unprotected segments.
The coupling of synthetic peptide-thioesters with peptides
carrying an N-terminal Cys leads to an amide-bond at the
ligation site. This approach has proven to be useful for
the synthesis of smaller proteins up to 120 amino acids in
Correspondence to A. G. Beck-Sickinger, Institute of Biochemistry,
University of Leipzig, Bru
¨derstr. 34, D-04103 Leipzig, Germany.
Fax: + 49 341 97 36 909, Tel.: + 49 341 97 36 900,
E-mail: beck-sickinger@uni-leipzig.de
Abbreviations: BAL, backbone amide linker; CBD, chitin binding
domain; eGFP, enhanced green fluorescent protein; EPL, expressed
protein ligation; FRET, fluorescence resonance energy transfer;
GFP, green fluorescent protein; HOBt, 1-hydroxybenzotriazole;
IMPACT
TM
, intein-mediated purification with an affinity chitin
binding tag; IPL, intein-mediated protein ligation; NCL, native
chemical ligation; PTPase, protein tyrosine phosphatase; SPPS,
solid-phase peptide synthesis; TROSY, transverse relaxation
optimized spectroscopy; TWIN, two intein system.
(Received 12 November 2003, revised 19 December 2003,
accepted 5 January 2004)
Eur. J. Biochem. 271, 663–677 (2004) FEBS 2004 doi:10.1111/j.1432-1033.2004.03978.x
length; larger proteins cannot be obtained easily in one
ligation step. Multistep NCL of different peptide-segments,
however, can lead to larger proteins [11]. An extension of
this NCL strategy is the expressed protein ligation (EPL)
method [12] using recombinant thioesters and/or aCys-
peptides. This review gives an overview of this method and
its applications in the past few years.
Native chemical ligation
The method of native chemical ligation was introduced by
Dawson et al. [13,14] and is based on the reaction between a
thioester and the sidechain of a Cys residue reported for
the first time by Wieland et al. [15]. Two fully unprotected
synthetic peptides react to form an amide bond, so they are
connected as in the native peptide backbone. The reaction
proceeds in aqueous conditions at neutral pH. The first step
of this process is the chemoselective transthioesterification
of an unprotected peptide Ca-thioester with an N-terminal
Cys of a second peptide. The so-formed thioester sponta-
neously undergoes an SN-acyl transfer to form a native
peptide bond and the resulting peptide product is obtained
in the final disposition. Internal Cys residues within both
peptide segments are permitted because the initial trans-
thioesterification step is reversible and no side products
are obtained, thus, no protecting groups are necessary. An
alternative method was introduced by Tam et al. [16,17],
where a C-terminal thiocarboxylic acid S-alkylates an
N-terminal a-bromoAla to form a covalent thioester. This
rearranges by SN-acyl shift and builds an -X-Cys- peptide
bond (Fig. 1).
To prevent the thiol of the N-terminal Cys from oxidation,
and thus forming an unreactive disulfide linked dimer, it is
necessary to add thiols or other reducing reagents like tris(2-
carboxyethyl)phosphine (TCEP) [18] to the reaction mix-
ture. Furthermore, the addition of an excess of thiols not
only keeps the thiol-functions reduced but also increases the
reactivity by forming new thioesters through transthioeste-
rification [19]. The addition of solubilizing agents such as
urea or guanidinium hydrochloride does not affect the
ligation reaction and can be used to increase the concentra-
tion of peptide segments and results in higher yields. The
compatibility and efficiency of all proteinogenic amino acids
at the C-terminus of the thioester peptide to react in NCL
was determined by Hackeng et al.[20].All20aminoacids
except Val, Ile and Pro can be placed in the -X-Cys- position
in NCL. Val, Ile and Pro are reported to react slowly. Also,
Asp and Glu as C-terminal residues are less favourable
because of the formation of side products [21].
A useful application of NCL is solid-phase chemical
ligation (SPCL) [22]. In this approach, one of the two
segments is bound to a polymer, while the other is applied in
aqueous solution and can be used in excess. A simple washing
step completely removes the solubilized peptides and the
assembled full length protein can be cleaved from the resin.
In the tandem peptide ligation approach, the NCL is
applied to the synthesis of peptides and proteins requiring
two or more ligation steps. NCL is combined with a
pseudoproline ligation by imine capture [23], the third step
can be pseudoglycine ligation [24].
In addition to Cys, related amino acids, including
selenoCys [25] and selenohomoCys [26], have been reported
to work in a similar manner.
Thioester formation
The bottleneck in NCL is the generation of the thioester.
Several applications have been developed using solid-phase
peptide synthesis. Most of the strategies to obtain peptide
thioesters have used the Boc-strategy [13,17] because of the
base-lability of the thioester. However, different attempts
in the synthesis of thioesters were performed by using the
9-fluorenylmethoxycarbonyl (Fmoc) method. In general,
the Fmoc-strategy has several advantages over the Boc-
strategy, the first being the milder conditions used for
cleavage from the resin. To circumvent the susceptibility of
the thioester linkages to nucleophiles like piperidine, used
for the removal of the Fmoc-protecting group, several
cocktails for deprotection have been developed, e.g.,
1,8-diazabicyclo[5.4.0]undec-7-ene (DBU) with 1-hydroxy-
benzotriazole (HOBt) [27], 1-methylpyrrolidine with
hexamethyleneimine and HOBt [28] or DBU and HOBt
[29]. The final cleavage from the resin then results in the
peptide thioester.
Further methods were introduced that used different
resins. One is based on modifications of Kenner’s sulfon-
amide Ôsafety catchÕlinker [30]. The C-terminus of the
growingpeptidechainisattachedtotheresinwithanacid-
and base-stable N-acyl sulfonamide linker. The sulfonamide
is activated after peptide synthesis by N-alkylation using
diazomethane or iodoacetonitrile. The cleavage occurs with
nucleophile like thiols, which finally results in a peptide
thioester [31,32]. In the backbone amide linker (BAL)
strategy, the first carboxy terminally protected amino acid is
attached to the resin on the backbone nitrogen. The peptide
chain grows in the N-terminal direction. Deprotection,
activation and thioester formation at the carboxy terminus
occurs on the solid support. The peptide thioester can be
cleaved from the resin with trifluoroacetic acid [33].
Another approach uses standard resins like phenyl-
acetamidomethyl (PAM) or 4-hydroxymethyl benzoic acid
(HMBA), the Lewis acid, Al(CH
3
)
2
Cl and thiols in
Fig. 1. Ligation of unprotected peptide segments. In native chemical
ligation (A) the first step is a transthioesterification of a Ca-thioester by
the thiol function of an N-terminal Cys followed by a spontaneous
SN-acyl shift to obtain a native peptide bond. In an alternative
approach (B), a Ca-thiocarboxylic acid reacts with an a-bromo amino
acid by forming a thioester. This leads to the same product as in
method A.
664 R. David et al. (Eur. J. Biochem. 271)FEBS 2004
methylenchloride [34]. Unfortunately, the alkylaluminium
thiolate method can lead to epimerization at the C-terminus
and reactions at the sidechains, e.g., sidechain thioesters and
aspartimide formation. This can be avoided by using a
weaker Lewis acid, e.g. Al(CH
3
)
3
[35]. A further possibility
is the synthesis of peptides on Cl-trityl-resin and the
cleavage of the fully protected peptide chain with acetic
acid and trifluoroethanol. The thioester can be obtained by
the treatment of the protected peptide with activating
reagents and thiols [36,37]. After deprotection of functional
sidechains with trifluoroacetic acid, the thioester can be
easily purified by HPLC (Fig. 2).
An alternative approach for the thioester synthesis of
larger peptides and proteins in high yields and purity uses
a bacterial expression system based on the intein mediated
self-splicing mechanism of precursor proteins as discussed
below.
Recombinant generation of proteins
with C-terminal thioester or N-terminal Cys
Inteins and their use in protein chemistry
Inteins are internal segments of precursor proteins that
catalyze their ipso excision, in an intramolecular process
called protein splicing, with the concurrent ligation of the
two flanking external regions (N- and C-exteins) through
a native peptide bond. This finally yields the host protein.
Thus, inteins are analogues of self-splicing RNA introns.
The first intein was discovered in 1987 and up to now over
100 inteins are listed [38–40]. The origin of inteins is not yet
clear. However, understanding of inteins, their evolution,
distributions and properties, will be easier if they are
considered as parasitic genetic elements. They will not
contribute to an organism’s fitness if they are propagated
into the next generation. The insertion of an intein gene into
a protein gene can be described through the so called
homing cycle. Homing is the transfer of a parasitic genetic
element to a cognate allele that lacks the element. This
process results in the duplication of the parasitic genetic
element and its rapid spread in a population [41–43]. Inteins
occur in organisms of all three domains of life as well as in
viral and phage proteins. There they are predominantly
found in enzymes involved in DNA replication and repair
[40,44]. Inteins can be divided into four classes: the maxi
inteins (with integrated endonuclease domain), mini inteins
(lacking the endonuclease domain), trans-splicing inteins
(where the splicing junctions are not covalently linked) and
Ala inteins (Ala as the N-terminal amino acid) [45]. The
sequences of inteins have some characteristics in common.
They appear in conserved regions of the host protein and all
intein sequences harbour different motifs termed A and B
(which contain a conserved Thr and His) at the N-terminal
splicing domain, F and G at the C-terminal splicing domain
(Fig. 3). Endonuclease containing inteins also bear the
blocks C, D, E and H [38,46]. The N-terminal amino acids
are typically Cys, Ser or Ala. The C-terminal block G
contains a conserved His/Asp pair and a downstream Cys,
Ser or Thr amino acid.
The nucleophilic thiol or hydroxyl sidechains of the
conserved amino acid residues led to the assumption that
(thio)esters that are formed by an NS- or an NO-shift
are intermediates of the internal rearrangement steps of the
splicing reaction. This was proven by various investigations.
Fig. 2. Formation of synthetic peptide a-thio-
esters. Peptide a-thioesters can be synthesized
by the Fmoc strategy by using backbone
amide linker resins (A), acidic cleavage from
mercaptoalkyl linker resins (B), Lewis acid
activated cleavage from common resins
(C), cleavage of fully protected peptides
(Boc, t-butyloxycarbonyl; tBu, t-Butyl) and
deprotection after thioester generation (D)
and by using of sulfonamide safety catch
linker resins (E).
Fig. 3. Characteristic positions of intein motifs and numbering. The
inserted intein carries the N-terminal extein (left shaded box) and the
C-terminal extein (right shaded box). The residues important for the
splicing process as well as the conserved segment blocks (A, B, C, D, E,
H, F, G) and some internal intein key amino acids are depicted in the
one letter code within the certain segments (bold black). Numbering of
the amino acids of a precursor protein is made in the following way:
the intein’s N-terminal amino acid (Cys, etc.) is numbered as 1
whereas the C-terminal amino acid of the N-terminal extein is num-
bered as )1 and the N-terminal residue of the C-terminal extein is
numbered beginning with +1.
FEBS 2004 Expressed protein ligation (Eur. J. Biochem. 271) 665
Replacement of the amino acid residues at the N-terminus
containing a nucleophilic thiol or hydroxyl sidechain and
the Asp at the C-terminus, through site directed mutagen-
esis, ended up in a complete loss of splicing activity of the
intein [47,48].
Splicing mechanism
The first step of the well understood standard splicing
process of inteins (Fig. 4) is the transfer of the N-terminal
extein unit to the sidechain -SH or -OH group of a Cys/Ser
residue located at the immediate N-terminus of the intein
(NS-acyl shift). In some cases, inteins bear Ala at the
ultimate position at their N-terminus. In such cases, the first
step is circumvented [48,49] and the +1 nucleophile within
the C-extein attacks the carbon of the peptide’s N-terminal
splicing junction. This rearrangement seems to be thermo-
dynamically highly unfavourable but the molecular archi-
tecture of the intein forces the scissile peptide bond into a
twisted conformation of higher energy and thereby pushes
the equilibrium to the (thio)ester side. The following step is a
new transfer of the N-terminal extein to the Cys/Ser/Thr at
the +1 position of the C-extein, which leads to a branched
intermediate. In the last step, which might be a concerted
reaction, a conserved Asp residue at the C-terminus of the
intein cyclizes and a peptide bond is formed between the two
exteins through an SN-acyl shift [50].
This splicing mechanism implicates the importance of the
conserved amino acids flanking the splicing junctions such
as the block B Thr and His, and the block G His [45].
In the case of C-terminal splicing, the cumulative data
indicate that the present penultimate His appears to assist
the C-terminal Asp cyclization, although there are reported
mutants referring to this residue which did not prevent
splicing. The three dimensional structure of the splicing
domain at the N-terminal part of the intein forces the
peptide bond into a twisted conformation. This could also
be protonated through the penultimate His residue men-
tioned above. Mutation of this amino acid did not affect the
first steps of the splicing up to the branched intermediate but
abolished the final step. In the X-ray crystal structure of the
intein, Mycobacterium xenopi gyrase (Mxe GyrA) (Fig. 5),
the His197 is hydrogen bonded to Asn198 so that His197 is
oriented for the donation of a proton from Ndposition to
the emerging alpha amino group of the C-extein, prior
to the SN-acyl shift [51,52]. Some putative inteins that
lack the penultimate His residue are either inactive or use
other amino acids. Accordingly, the penultimate His is not
absolutely required but increases the splicing rate. Block B
contains Thr and His that are separated through two amino
Fig. 4. Mechanism of intein-mediated protein
splicing. In the initial step a thioester
intermediate is formed by an NS-acyl shift
at the N-terminal Cys of the intein (Cys
1
).
Transthioesterification by a nucleophilic
attack of the sidechain of the N-terminal Cys
of the C-extein (Cys
+1
) on the thioester is
formed in the first step and results in a
branched intermediate. Peptide bond cleavage
coupled to succinimide formation of the
C-terminal intein–Asp releases the intein. The
knotted exteins undergo a spontaneous SN-
acyl shift and yield a peptide bond. Peptide
bond cleavage can occur independently at
both splicing sites. Mutation of Cys
1
to Ala
prevents splicing at the N-terminus and leads
to a C-terminal extein bonded with the intein.
C-terminal splicing cannot occur when the
C-terminal Asn is substituted by an Ala
residue and the N-terminal extein is cleaved
by nucleophilic attack.
666 R. David et al. (Eur. J. Biochem. 271)FEBS 2004
acids. Both play a key role for the N-terminal splicing
process. Substitution of block B His to Leu in Sce VMA
abolished splicing [53,54] and only C-terminal cleavage
occurred. This implies that this His residue takes part in the
first NS rearrangement at the N-terminal splicing junc-
tion. X-ray crystal structures of Sce VMA1 [55–57] and
Mxe GyrA [51] with exteins showed a protonation of the
scissile peptide bond through the imidazole ring. This
interaction promotes the breakdown of the tetrahedral
intermediate formed by the +1 nucleophilic attack of the
N-terminal thioester bond. These findings were further
elucidated and confirmed through investigations of Ala
inteins. The exact role of Thr is not yet fully understood
because of the lack of available structural data. It has been
postulated that the Mxe GyrA intein stabilizes the tetra-
hedral intermediate at the N-terminal splicing junction by
the formation of an oxy anion hole through Ndof Asn74
and the block B Thr.
Both effects, the spatial constraints and the electronic
influence, lead to a reactive and accessible electrophilic
carbon of the scissile peptide bond as an acid/base catalysis
mechanism is suggested.
Furthermore, divalent transition metal cations influence
the protein splicing process. It was shown for the split
inteins Ssp DnaE and the Mtu RecA that micromolar
concentrations of Zn
2+
ions decreased the splicing rate and
Zn
2+
ion concentrations in the millimolar range stopped
completely the process through chelation of key amino
acids. A similar effect was obtained for Cd
2+
ions [58,59].
Classification of inteins
The elucidation of the splicing mechanism and the identi-
fication of the key amino acid residues involved in the
scission and ligation of the peptide bonds facilitated the
molecular engineering of artificial inteins as tools for
different applications in protein chemistry. Currently there
are five general methods of intein usage in this field so far:
(a) modified inteins with an inducible autocatalytic cleavage
activity are used for protein purification; (b) inteins are used
for trans-splicing. Here the inteins are split into two
fragments that can recombine and reconstitute their splicing
activity in vivo or in vitro. (c) Intein mediated protein ligation
(IPL) is used for the generation of specifically mono-
activated proteins, which can further be ligated with peptide
segments and provides access to artificially labelled proteins;
(d) inteins facilitate the synthesis of cyclic proteins and
(e) inteins are used for the detection of protein–protein
interactions [45,46].
Three dimensional structures of inteins
The structure of the intein Sce VMA1 that was determined
by X-ray crystallography clearly shows two domains
(Fig. 5) [55–57]. The structure of the splicing domain is
similar to that of the mini intein in the Mycobacterium
xenopi gyrase (Mxe GyrA) [51]. Residues from the endo-
nuclease domain of Sce VMA1 contribute to target
sequence-specific contacts as well as parts of the other
domain that are distant from the Sce VMA1 cleavage site.
Several studies have been made by photo-crosslinking to
identify these residues [60]. The splicing domains have
predominantly all b-structures and show high similarity to
the structure of the hedgehog proteins that are important in
the development of multicellular organisms [61].
Formation of C-terminal thioester-activated proteins
Protein engineering via NCL requires the specific generation
of C-terminal thioester-tagged proteins allowing ligation
with a second peptide or protein containing an N-terminal
Cys or Ser residue. The potent synthesis of Ca-thioesters of
bacterially expressed proteins was found through studies of
the N-terminal cleavage mechanism of inteins. In general,
the cleavage of the peptide bonds at either the N-terminus or
the C-terminus of the intein can occur independently.
Replacement of the C-terminal Asp by Ala blocked the
splicing process in the Pyrrococcus species GB-D intein.
However, the lack of the succinimide formation did not
affect the preceding NO-acyl rearrangement at the
N-terminal splicing junction. The same data were found
previously for the NS-acyl shift in the Sce VMA intein.
Incubation of this modified intein with thiols, like dithio-
threitol, releases the corresponding free C-terminal thioester-
tagged extein from the N-terminal splicing junction through
transthioesterification. This thiol-inducible cleavage activity
of an engineered intein was the beginning of the extensive
exploitation of other intein mutants as workhorses in the
area of biotechnology to obtain mono-thioester labelled
proteins and aCys-proteins [46,50].
Fig. 5. Comparison of Mxe GyrA (A) and Sce
VMA (B) intein structure. The structures of
both inteins have been determined by X-ray
crystallography [51,55,56] (PDP files 1AM2
and 1LWS, http://www.rcsb.org/pdb/). Blue
arrows indicate b-sheets whereas purple cyl-
inders symbolize a-helices. The N-termini are
coloured in green and C-terminal b-sheets in
red. The endonuclease domain of Sce VMA
(right part) is clearly separated from the self-
splicing domain (left part).
FEBS 2004 Expressed protein ligation (Eur. J. Biochem. 271) 667