M I N I R E V I E W
A study of microRNAs in silico and in vivo: bioinformatics approaches to microRNA discovery and target identification Malik Yousef1,2, Louise Showe3 and Michael Showe3
1 The Galilee Society Institute of Applied Research, Israel 2 Al-Qasemi Academic College, Baqa Algharbiya, Israel 3 Systems Biology Division, The Wistar Institute, Philadelphia, PA, USA
Keywords bioinformatics; machine learning; microRNA; microRNA target
Correspondence M. Yousef, Research & Development Center, The Galilee Society, P.O. Box 437, Shefa-Amr 20200, Israel Fax: +972 4 95044525 Tel: +972 4 9504523 ⁄ 4 E-mail: yousef@gal-soc.org
The discovery that microRNAs (miRNAs) are synthesized as hairpin-con- taining precursors and share many features has stimulated the development of several computational approaches for identifying new miRNA genes in various animal species. Many of these approaches rely heavily on conserva- tion of sequence within and between species, whereas others emphasize machine-learning methods to screen hairpin candidates for structural features shared with known miRNA precursors. The identification of ani- mal miRNA targets is a particularly difficult problem because an exact match to the target sequence is not required. We discuss the most recently devised algorithms for miRNA and target discovery. We do not discuss plant miRNAs because their varying sizes and structural characteristics pose different problems of identification and target selection.
(Received 27 August 2008, revised 9 January 2009, accepted 22 January 2009)
doi:10.1111/j.1742-4658.2009.06933.x
Machine-learning approaches to microRNA discovery
tion of the rules that define the positive class. This is especially important in this case because these charac- teristics are not always explicitly defined. Readers who wish to pursue machine learning in greater detail are referred to a recent review [3].
Methods derived from the machine-learning field have recently been applied to microRNA (miRNA) discov- ery with good success. Machine learning depends on the development of algorithms and methods that allow a specific computer program to learn from data already collected on verified miRNAs. These algo- rithms require a training set for the learning process that consists of positive examples (which define the miRNA characteristics) and negative examples (the control set of non-miRNA sequences). The known microRNAs used as positive examples can be down- loaded from the miRBase [1,2] database, and random sequences can be one choice of negative set. The most important tasks associated with the learning process are the identification of characteristics, and the defini-
Examples of supervised machine-learning algorithms include Naı¨ ve Bayes, support vector machines (SVM), hidden Markov models (HMM), neural networks and the k-nearest neighbor algorithm. Naı¨ ve Bayes is a classification model obtained by applying a relatively simple method to a training data set [4]. A Naı¨ ve Bayes classifier calculates the probability that a given instance (example) belongs to a certain class. SVMs are widely used machine-learning algorithms developed by Vapnik [5]. In this technique, the numbers describ- ing each feature of a miRNA are combined into a single vector in an n-dimensional space. The algorithm compares the vectors from the positive class with those
Abbreviations HMM, hidden Markov models; miRNA, microRNA; SVM, support vector machine.
FEBS Journal 276 (2009) 2150–2156 ª 2009 The Authors Journal compilation ª 2009 FEBS
2150
M. Yousef et al.
Bioinformatics for miRNA discovery
features with associated weights to build a computa- tional tool, which assigns scores to hairpin candi- dates. The weights are estimated using statistics based on the previously known miRNAs from Caenorhabd- itis elegans. Grad et al. [13] developed a computa- tional method using conservation and sequence structural similarity to predict miRNAs in the C. ele- gans genome. Lai et al. [13] used similar ideas to develop a different computational tool, called miRsee- ker, for the Drosophila genome. These efforts were previously reviewed by Bartel [14]. Others have used homology searches for revealing paralog and ortholog miRNAs [10,15–18]. Additionally, Wang et al. [19] developed a method for miRNA identification based on sequence and structure alignment.
Fig. 1. The solid line is the separating hyperplane and the dashed lines are the margins for a SVM trained with samples from two classes. Samples (point) on the margin are called the support vectors.
ProMiR [20]
from the negative class and finds a ‘hyperplane’ that produces the best separation (margin) between the two classes. The ‘support vectors’ are the samples from the two classes that are closest together but still separable – they ‘support’ the separating hyperplane (Fig. 1). The performance of this algorithm, compared with other algorithms, has proven to be particularly useful for the analysis of various classification problems, par- ticularly when the two classes are closely related or nonuniform, and has recently been widely used in the bioinformatics field [6–8].
MicroRNA discovery tools
the
triplet-SVM. BayesMiRNAfind
[22]
Numerous computational approaches (in addition to machine learning) have been implemented for miRNA gene prediction using methods based on sequence conservation and ⁄ or structural similarity [9–13]. Some these tools are listed in Table 1. Lim et al. of [9] developed a program for identification of miRNAs, called MiRscan, with 70% specificity at a seven miRNA sensitivity of 50%. MiRscan uses
is based on machine learning for miRNA discovery. ProMiR uses a highly specific prob- abilistic model (HMM) whose topology and states are handcrafted based on prior knowledge and assump- tions, and whose exact probabilities are derived from the accumulated data. Pfeffer et al. [20a] used SVMs for predicting conserved miRNAs in herpesviruses. The features that defined the positive class were extracted from the sequence and structural features in the stem loop. The negative class was generated from mRNAs, rRNAs or tRNAs from human and viral genomes, which should not include any miRNA sequences. The same approach was also applied for the analysis of clustered miRNAs [21] using a tool named mir-abela, while Xue et al. [27] developed an SVM classifier as a two-class tool that does not rely on com- parative genomic approaches. They defined a negative class called pseudo precursor-miRNAs (pre-miRNAs). The criteria for this negative class included a minimum of 18 paired bases, a maximum of )15 kcalÆmol)1 fold- is ing free energy and no multiple loops. The tool called a is machine-learning approach based on the Naı¨ ve Bayes classifier for predicting miRNA genes. This method
Table 1. Computational tools for miRNA predictions.
Algorithm
Web link
References
MiRseeker MiRscan miRank proMiR II mir-abela triplet-SVM Vmir RNA micro micros BayesMiRNAFind One-ClassMirnaFind
http://genes.mit.edu/mirscan/ MiRank is programmed in MATLAB http://cbit.snu.ac.kr/~ProMiR2/ http://www.mirz.unibas.ch/cgi/pred_miRNA_genes.cgi http://bioinfo.au.tsinghua.edu.cn/mirnasvm/ http://www.hpi-hamburg.de/fileadmin/downloads/VMir.zip http://www.bioinf.uni-leipzig.de/~jana/software/index.html Based on LIBSVM library package [30] https://bioinfo.wistar.upenn.edu/miRNA/miRNA/login.php http://wotan.wistar.upenn.edu/OneClassmiRNA/
Lai et al. [13] Lim et al. [9,11] Xue et al. [27] Nam et al. [20] Sewer et al. [21] Xue et al. [27] Grundhoff et al., 2006 [23] Hertel & Stadler [24] Sheng et al. [25] Yousef et al. [22] Yousef et al. [26]
FEBS Journal 276 (2009) 2150–2156 ª 2009 The Authors Journal compilation ª 2009 FEBS
2151
M. Yousef et al.
Bioinformatics for miRNA discovery
examples, as well as the query sequences, during the training and classification steps.
We should note,
in passing, that high-throughput methods for sequencing isolated small RNAs provide a new tool for identifying new miRNA species [28], and a new method for amplifying low-concentration miRNAs allows easier testing of predictions [29].
Target identification
differs from previous efforts in two ways (a) it gener- ates the model automatically and identifies rules based on the miRNA gene structure and sequence, allowing prediction of nonconserved miRNAs and (b) it uses a comparative analysis over multiple species to reduce the false-positive rate. This allows for a trade-off between sensitivity and specificity. The resulting algo- rithm demonstrates higher specificity and similar sensi- tivity to algorithms that use conserved genomic regions to reduce false positives [9,11–13]. Grundhoff et al. [23] developed an approach to identify miRNAs that is based on bioinformatics and array-based technologies. The bioinformatics tool, VMir [23], does not rely on evolutionary sequence conservation. RNAmicro [24] is another miRNA prediction tool developed by Hertel & Stadler that relies mainly on comparative sequence analysis, rather than on structural features, using two- class SVM.
secondary
Sheng et al. [25] describe a computational method, mirCoS, that applies three SVM models (based on sequence, and conservation), structure sequentially, to identify new conserved miRNA candi- dates in mammalian genomes.
Although recent findings [31] suggest that miRNAs may affect gene expression by binding to either 5¢- or 3¢-UTRs of mRNA, most studies have found that miRNA mark their target mRNAs for degradation or suppress their translation by binding to the 3¢-UTR and most target programs search there. These studies have suggested that the miRNA seed segment, which includes six to eight nucleotides at the 5¢ end of the mature miRNA sequence, is very important in the selection of the target site (Fig. 2). Thus, most of the computational tools developed to identify mRNA target sequences depend heavily on complementarity between the miRNA seed sequence and the target sequence. Diana-microT [32] was one of the first com- putational tools for target prediction that identified specific interaction rules based on bioinformatics and experimental approaches. The tool successfully recov- ered all validated C. elegans miRNA targets.
Defining the negative class is a major challenge in developing machine-learning algorithms for miRNA identification. Two machine-learning approaches have recently appeared for identifying miRNAs without the necessity of defining a negative class. Yousef et al. pre- sented a study using one-class machine learning for miRNA using only positive data to build the classifier (One-ClassMirnaFind [26]). Several different classifiers, including two classes of SVM, were used to compare the one-class approach with the corresponding two- class methods. Although the two-class procedure was generally found to be superior, it was more complex to implement.
Xue et al.
Several additional methods for the prediction of miRNA targets have been subsequently developed. These methods mainly use sequence complementarities, thermodynamic stability calculations and evolutionary conservation among species to determine the likelihood of formation of a productive miRNA–mRNA duplex [14,33]. John et al. [34] developed the miranda algo- rithm for miRNA target prediction. miranda uses dynamic programming to search for optimal sequence complementarities between a set of mature miRNAs and a given mRNA. miRNA.org (http://www.micro rna.org) [35] is a comprehensive resource of miRNA target predictions and miRNA ‘expression profiles’. Target predictions are based on the miranda algo- are rithm, whereas miRNA ‘expression profiles’ derived from a comprehensive sequencing project of a large set of mammalian tissues and cell lines of normal
similar of
Fig. 2. The duplex for miRNA hsa-miR-579 and its target LRIG3 is partitioned into two parts, the seed part and the out-seed part. The seed part is indicated by capital letters.
[27] recently developed a tool called is a novel ranking algorithm miRank. miRank [27] based on a random walk through a graph consisting of known miRNA examples and unknown candidate sequences. Each miRNA is a vertex connected to its neighbor by an edge that is weighted by its similarity of miRNA features. The score or relevance of a vertex increases with its number of connections. The vertices are then ranked by relevance score, and an arbitrary cut-off of the ranked list includes both the positive the predicted examples and the most unknowns. The strength of miRank is its ability to identify novel miRNAs in newly sequenced genomes where there are few annotated miRNAs (positive examples). The authors found miRank to be superior to SVM classifiers, and attribute its success to the fact it structures the list and ranks the candidate that
FEBS Journal 276 (2009) 2150–2156 ª 2009 The Authors Journal compilation ª 2009 FEBS
2152
M. Yousef et al.
Bioinformatics for miRNA discovery
ter genes (15%). This is likely to be a conservative estimate because of the incomplete input data.
and disease origin. Another algorithm, rnahybrid [36] [37], is similar to an RNA secondary-structure predic- like the mfold program [38], but it tion algorithm, determines favorable hybridization site the most between two sequences.
TargetBoost [44] is a machine-learning algorithm for miRNA target prediction using only sequence informa- tion to create weighted sequence motifs that capture the binding characteristics between miRNAs and their targets. The authors suggest that TargetBoost is stable and identifies more of the already verified true targets than do other existing algorithms.
Sung-Kyu et al. [45], also reported the development of a machine-learning algorithm using SVM. The best reported results were 0.921 sensitivity and 0.833 speci- ficity. More recently, Yan et al. [40] used a machine- learning approach that employs features extracted from both seed and outseed segments. The best result obtained was an accuracy of 82.95%, which was gener- ated using only 48 positive and 16 negative human examples – a relatively small training set to assess the algorithm.
Bennecke et al. [39] have recently suggested that the 3¢ out-seed segment of the miRNA–mRNA duplex can compensate for imperfect base pairing of the target with the seed segment, and a recent computational approach [40] has considered the contributions of both seed and out-seed miRNA segments in target identifi- cation. Using sequence conservation reduces false-posi- tive predictions but, as a result, some less-conserved target sites may be missed. This presents a dilemma, which is how to avoid rejection of these less highly- conserved target sites while still reducing the very large numbers of predictions that are found when seed region conservation in the target is not required. In order to reduce the false-positive predictions inherent in methods that heavily weight specific target sequence conservation, Lewis et al. [41] developed TargetScanS. TargetScanS scores target sites based on the conserva- tion of the target sequences between five genomes (human, mouse, rat, dog and chicken) because evolu- tionarily conserved target sequences are more likely to be true targets. In testing, TargetScanS was able to recover targets for all 5300 human genes known at the time to be targeted by miRNAs (Table 2).
In 2006, Thadani & Tammi [46] launched MicroTar, a novel statistical computational tool for prediction of miRNA targets from RNA duplexes, which does not use sequence homology for prediction. MicroTar mainly relies on a quite novel approach to estimate the duplex energy. However, the reported sensitivity (60%) is significantly lower than that achieved using other published algorithms. At the same time, a miRNA pat- tern-discovery method, RNA22 [47], was proposed for use in scanning UTR sequences for targets. RNA22 does not rely upon cross-species conservation but was able to recover most of the known target sites with validation of some of its new predictions.
of
alignments
orthologous
[48]) using,
PicTar [42] is a computational method used to detect common miRNA targets in vertebrates, C. elegans and Drosophila. PicTar is based on a statistical method applied to eight vertebrate genome-wide alignments [multiple nucleotide sequences (3¢-UTRs)]. PicTar was able to recover vali- dated miRNA targets at an estimated 30% false-posi- tive rate. In a separate study, PicTar was applied to target identification in Drosophila melanogaster [43] . These studies suggest that one miRNA can target 54 genes, on average, and that known miRNAs are pro- jected to regulate a large fraction of all D. melanogas-
More recently, Yousef et al. described a target-pre- instead, diction method (NBmiRTar machine learning by a Naı¨ ve Bayes classifier. NBmiR- Tar does not require sequence conservation but gener- ates a model from sequence and miRNA–mRNA duplex information derived from validated target sequences and artificially generated negative examples. In this case, both the seed and the ‘out-seed’ segments
Table 2. MicroRNA target prediction tools.
Web link
Algorithm
References
TargetScanS miRanda PicTar RNAhybrid Diana-microT Target Boost Rna22 MicroTar NBmiRTar miRecords
http://genes.mit.edu/targetscan http://www.microma.org http://pictar.bio.nyu.edu http://bibiserv.techfak.uni-bielefeld.de/rnahybrid http://www.diana.pcbi.upenn.edu/cgi-bin/micro_t.cgi https://demo1.interagon.com/demo http://cbcsrv.watson.ibm.com/rna22_targets.html http://tiger.dbs.nus.edu.sg/microtar/ http://wotan.wistar.upenn.edu/NBmiRTar http://mirecords.umn.edu/miRecords/
Lewis et al. [41] John et al. [34] Krek et al. [42] Rehmsmeier et al. [36] Kiriakidou et al. [32] SaeTrom et al. [44] Miranda et al. [47] Thadani and Tammi [46] Yousef et al. [48] Xiao et al. [51]
FEBS Journal 276 (2009) 2150–2156 ª 2009 The Authors Journal compilation ª 2009 FEBS
2153
M. Yousef et al.
Bioinformatics for miRNA discovery
the miRNA–mRNA duplex are used for target of identification. The NBmiRTar technique produces fewer false-positive predictions and fewer target candi- dates to be tested than miranda [34]. It exhibits higher sensitivity and specificity than algorithms that rely only on conserved genomic regions to decrease false-positive predictions.
TarBase [49] contains a set of experimentally sup- ported targets in different species that are collected manually from the literature. TarBase version 5 has more than 1300 experimentally supported miRNA tar- get interactions. The database has information about the target site described by the duplex of miRNA and gene. It also includes information on the experiments that were conducted to test the target, the sufficiency of the site to induce translational repression and ⁄ or cleavage and a reference to the paper used to extract the information.
Argonaute [50]
is a compilation of comprehensive information on mammalian miRNAs, their origin and regulated target genes, in an exhaustively curated data- base. The source information of Argonaute is from both literature and other databases (Table 3).
The most recently released database, miRecords [51], is an integrated resource for animal miRNA–target interactions. miRecords stores predicted miRNA targets produced by 11 established miRNA target prediction programs.
References
In a 2004 review, Lai [33] noted that there is almost no overlap among the predicted targets identified by the various methods and suggested that each tool cap- tures a subset of the entire target class as a function of the specific features they have incorporated into their prediction models. More recently, Sethupathy et al. [49] conducted a comparison of the five most com- monly used tools for mammalian target prediction. This study indicated that 30% of the experimentally validated target sites are nonconserved, supporting the need for the development of different or complemen- tary computational approaches to capture new target sites. Furthermore, the large number of predictions that each of these tools is producing suggests that the heavy reliance on homology or comparative-sequence analysis is not sufficient to generate accurate predic- tions with a high sensitivity and there are yet-to-be identified recognition parameters that must be cons- idered.
1 Griffiths-Jones S, Saini HK, van Dongen S & Enright AJ (2008) miRBase: tools for microRNA genomics. Nucleic Acids Res 36, D154–D158. 2 Griffiths-Jones S (2004) The microRNA Registry.
Databases for microRNA and targets
Nucleic Acids Res 32, D109–D111. 3 Larranaga P et al. (2006) Machine learning in bioin- formatics. Brief Bioinform 7, 86–112. 4 Mitchell T (1997) Machine Learning. McGraw Hill, New York, NY. 5 Vapnik V (1995) The Nature of Statistical Learning Theory. John Wiley & Sons, New York, NY.
6 Haussler D (1999) Convolution Kernels on Discrete Structures. Technical Report UCSCCRL -99-10. Baskin School of Engineering, University of Califor- nia, Santa Cruz, CA.
There is a variety of very useful databases that provide a significant amount of information on miRNA and Target predictions (Table 3). The most extensive data- base for both miRNA and target sequences is miRBase [1]. miRBase contains miRNA mature sequences, hair- pin sequences of precursors and associated annotation. Release 12.0 of the database contains 8619 entries rep- resenting hairpin precursor miRNAs, responsible for the production of 8273 mature miRNA products, in primates, rodents, birds, fish, worms, flies, plants and viruses. miRBase also contains predicted miRNA tar- get genes in miRBase Targets, and provides a gene naming and nomenclature function in the miRBase Registry. The miRNA target genes are predicted using the miRanda tool [34] and are not necessarily experi- mentally validated.
Table 3. Databases for microRNA and targets.
7 Pavlidis P, Weston J, Cai J & Grundy WN (2001) Gene functional classification from heterogeneous data. In Proceedings of the Fifth Annual International Conference on Computational Biology, pp. 249–255. ACM Press, Montreal.
Database
Web link
8 Donaldson I et al. (2003) PreBIND and Textomy – mining the biomedical literature for protein-protein interactions using a support vector machine. BMC Bioinformatics 4, 11.
MiRBase TarBase Argonaute miRecords
http://microrna.sanger.ac.uk/ http://diana.cslab.ece.ntua.gr/tarbase/ http://www.ma.uni-heidelberg.de/apps/zmf/argonaute/ http://mirecords.umn.edu/miRecords/
FEBS Journal 276 (2009) 2150–2156 ª 2009 The Authors Journal compilation ª 2009 FEBS
2154
9 Lim LP, Glasner ME, Yekta S, Burge CB & Bartel DP (2003) Vertebrate microRNA genes. Science 299, 1540. 10 Weber MJ (2005) New human and mouse microRNA genes found by homology search. FEBS J 272, 59–73. 11 Lim LP, Lau NC, Weinstein EG, Abdelhakim A, Yekta S, Rhoades MW, Burge CB & Bartel DP (2003)
M. Yousef et al.
Bioinformatics for miRNA discovery
class is undetermined – microRNA gene idenification. Algorithms Mol Biol doi:10.1186/1748-7188-3-2. The microRNAs of Caenorhabditis elegans. Genes Dev 17, 991–1008.
12 Lai E, Tomancak P, Williams R & Rubin G (2003) Computational identification of Drosophila micro- RNA genes. Genome Biol 4, R42. 13 Grad Y, Aach J, Hayes GD, Reinhart BJ, Church 27 Xue C, Li F, He T, Liu G-P, Li Y & Zhang X (2005) Classification of real and pseudo microRNA precur- sors using local structure-sequence features and sup- port vector machine. BMC Bioinformatics 6, 310. 28 Glazov EA, Cottee PA, Barris WC, Moore RJ, Dal-
GM, Ruvkun G & Kim J (2003) Computational and experimental identification of C. elegans microRNAs. Mol Cell 11, 1253. 14 Bartel DP (2004) MicroRNAs: genomics, biogenesis, rymple BP & Tizard ML (2008) A microRNA catalog of the developing chicken embryo identified by a deep sequencing approach. Genome Res 18, 957–964. mechanism, and function. Cell 116, 281.
15 Lagos-Quintana M, Rauhut R, Lendeckel W & Tuschl T (2001) Identification of novel genes coding for small expressed RNAs. Science 294, 853–858.
29 Berezikov E, van Tetering G, Verheul M, van de Belt J, van Laake L, Vos J, Verloop R, van de Wetering M, Gurvey V, Takada S et al. (2006) Many novel mammalian microRNA candidates identified by exten- sive cloning and RAKE analysis. Genome Res 16, 1289–1298. 30 Chang C-C & Lin C-J (2001) LIBSVM: a library for 16 Lau NC, Lim LP, Weinstein EG & Bartel DP (2001) An abundant class of tiny rnas with probable regula- tory roles in Caenorhabditis elegans. Science 294, 858– 862. support vector machines. 17 Lee RC & Ambros V (2001) An extensive class of
small rnas in Caenorhabditis elegans. Science 294, 862– 864. 18 Pasquinelli AE et al. (2000) Conservation of the 31 Lytle JR, Yario TA & Steitz JA (2007) Target mRNAs are repressed as efficiently by microRNA-binding sites in the 5¢ UTR as in the 3¢ UTR. Proc Natl Acad Sci 104, 9667–9672.
sequence and temporal expression of let-7 heterochron- ic regulatory RNA. Nature 408, 86.
19 Wang X, Zhang J, Li F, Gu J, He T, Zhang X & Li Y (2005) MicroRNA identification based on sequence and structure alignment. Bioinformatics 21, 3610–3614. 32 Kiriakidou M, Nelson PT, Kouranov A, Fitziev P, Bouyioukos C, Mourelatos Z & Hatzigeorgiou A (2004) A combined computational-experimental approach predicts human microRNA targets. Genes Dev 18, 1165–1178. 33 Lai E (2004) Predicting and validating microRNA tar- gets. Genome Biol 5, 115.
20 Nam J-W, Shin K-R, Han J, Lee Y, Kim VN & Zhang B-T (2005) Human microRNA prediction through a probabilistic co-learning model of sequence and structure. Nucleic Acids Res 33, 3570–3581. 20a Pfeffer S, Sewer A, Lagos-Quintana M, Sheridan R, 34 John B, Enright AJ, Aravin A, Tuschl T, Sander C & Marks DS (2004) Human microRNA targets. PLoS Biol 2, e363. 35 Betel D, Wilson M, Gabow A, Marks DS & Sander C
(2008) The microRNA.org resource: targets and expression. Nucleic Acids Res 36, D149–D153. Sander C, Grasser FA, van Dyk LF, Ho CK, Shuman S, Chien M et al. (2005) Identification of microRNAs of the herpesvirus family. Nat Meth 2, 269–276. 21 Sewer A et al. (2005) Identification of clustered 36 Rehmsmeier M, Steffen P, H.-Chsmann M & Giege-
microRNAs using an ab initio prediction method. BMC Bioinformatics 6, 267. rich R (2004) Fast and effective prediction of microR- NA ⁄ target duplexes. RNA 10, 1507–1517. 22 Yousef M, Nebozhyn M, Shatkay H, Kanterakis S,
37 Kruger J & Rehmsmeier M (2006) RNAhybrid: micr- oRNA target prediction easy, fast and flexible. Nucleic Acids Res 34, W451–W454. 38 Zuker M (2003) Mfold web server for nucleic acid
folding and hybridization prediction. Nucleic Acids Res 31(13), 3406–3415.
Showe LC & Showe MK (2006) Combining multi-spe- cies genomic data for microRNA identification using a Naive Bayes classifier. Bioinformatics 22, 1325–1334. 23 Grundhoff A, Sullivan CS & Ganem D (2006) A com- bined computational and microarray-based approach identifies novel microRNAs encoded by human gamma-herpesviruses. RNA 12, 733–750. 39 Brennecke J, Stark A, Russell RB & Cohen SM (2005) Principles of microRNA-target recognition. PLoS Biol 3, e85. 40 Yan X et al. (2007) Improving the prediction of 24 Hertel J & Stadler PF (2006) Hairpins in a Haystack: recognizing microRNA precursors in comparative genomics data. Bioinformatics 22, e197–e202. human microRNA target genes by using ensemble algorithm. FEBS Lett 581, 1587.
FEBS Journal 276 (2009) 2150–2156 ª 2009 The Authors Journal compilation ª 2009 FEBS
2155
25 Sheng Y, Engstrom PG & Lenhard B (2007) Mamma- lian microRNA prediction through a support vector machine model of sequence and structure. PLoS ONE 2, e946. 41 Lewis BP, Shih IH, Jones-Rhoades MW, Bartel DP & Burge CB (2003) Prediction of mammalian microRNA targets. Cell 115, 787. 26 Yousef M, Jung S, Showe L & Showe M (2008) 42 Krek A et al. (2005) Combinatorial microRNA target Learning from positive examples when the negative predictions. Nat Genet 37, 495–500.
M. Yousef et al.
Bioinformatics for miRNA discovery
47 Miranda KC, Huynh T, Tay Y, Ang Y-S, Tam W-L,
43 Grun D, Wang Y-L, Langenberger D, Gunsalus KC & Rajewsky N (2005) microRNA target predictions across seven Drosophila species and comparison to mammalian targets. PLoS Comput Biol 1, e13. 44 SaeTrom OLA, Snove OJ & SaeTrom PAL (2005) Thomson AM, Lim B & Rigoutsos I (2006) A pattern- based method for the identification of microRNA binding sites and their corresponding heteroduplexes. Cell 126, 1203–1217. 48 Yousef M, Jung S, Kossenkov AV, Showe LC &
Weighted sequence motifs as an improved seeding step in microRNA target prediction algorithms. RNA 11, 995–1003. 45 Sung-Kyu K, Jin-Wu N, Wha-Jin L & Byoung-Tak Z Showe MK (2007) Naive Bayes for microRNA target predictions – machine learning for microRNA targets. Bioinformatics 23, 2987–2992. 49 Sethupathy P, Corda B & Hatzigeorgiou AG (2006)
TarBase: a comprehensive database of experimentally supported animal microRNA targets. RNA 12, 192–197.
(2005) A Kernel method for microRNA target prediction using sensible data and position-based features. In Proceedings of the 2005 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology, pp. 46–52. CIBCB, La Jolla, CA. 50 Shahi P et al. (2006) Argonaute – a database for gene regulation by mammalian microRNAs. Nucleic Acids Res 34, D115–D118.
FEBS Journal 276 (2009) 2150–2156 ª 2009 The Authors Journal compilation ª 2009 FEBS
2156
46 Thadani R & Tammi M (2006) MicroTar: predicting microRNA targets from RNA duplexes. BMC Bioin- formatics 7, S20. 51 Xiao F, Zuo Z, Cai G, Kang S, Gao X & Li T (2009) miRecords: an integrated resource for microRNA-tar- get interactions. Nucleic Acids Res 37, D105–D110.