Genomics 113 (2021) 1098–1113
Available online 4 March 2021
0888-7543/© 2021 Elsevier Inc. This article is made available under the Elsevier license (http://www.elsevier.com/open-access/userlicense/1.0/).
Review
Bioinformatic tools for DNA methylation and histone modification:
A survey
Nasibeh Chenarani
a
, Abbasali Emamjomeh
a
,
b
,
*
, Abdollah Allahverdi
c
, SeyedAli Mirmostafa
d
,
Mohammad Hossein Afsharinia
d
, Javad Zahiri
d
,
e
,
**
a
Department of Plant Breeding and Biotechnology (PBB), Faculty of Agriculture, University of Zabol, Zabol, Iran
b
Laboratory of Computational Biotechnology and Bioinformatics (CBB), Department of Bioinformatics, Faculty of Basic Sciences, University of Zabol, Zabol, Iran
c
Department of Biophysics, Faculty of Biological Sciences, Tarbiat Modares University, Tehran, Iran
d
Bioinformatics and Computational Omics Lab (BioCOOL), Department of Biophysics, Faculty of Biological Sciences, Tarbiat Modares University, Tehran, Iran
e
Department of Neuroscience, University of California, San Diego, USA
ARTICLE INFO
Keywords:
Epigenetics
Database
Prediction
Algorithm
Tools
ABSTRACT
Epigenetic inheritance occurs due to different mechanisms such as chromatin and histone modifications, DNA
methylation and processes mediated by non-coding RNAs. It leads to changes in gene expressions and the
emergence of new traits in different organisms in many diseases such as cancer. Recent advances in experimental
methods led to the identification of epigenetic target sites in various organisms. Computational approaches have
enabled us to analyze mass data produced by these methods. Next-generation sequencing (NGS) methods have
been broadly used to identify these target sites and their patterns. By using these patterns, the emergence of
diseases could be prognosticated. In this study, target site prediction tools for two major epigenetic mechanisms
comprising histone modification and DNA methylation are reviewed. Publicly accessible databases are reviewed
as well. Some suggestions regarding the state-of-the-art methods and databases have been made, including
examining patterns of epigenetic changes that are important in epigenotypes detection.
1. Introduction
Epigenetic inheritance was introduced for the first time in 1940s as
the interaction between gene and environment which leads to the
emergence of new phenotypes in organisms [105]. Epigenetics is the
study of heritable phenotype changes that do not involve alteration in
DNA sequence [106].
Epigenetic changes are made under the influence of different
mechanisms such as DNA methylation, histone modification, chromatin
organization, and regulatory processes mediated by non-coding RNAs.
In fact, the importance of identifying DNA methylation in epigenetic
code is similar to the importance of identifying Expressed Sequence Tags
(ESTs) which provided the first outlook to genetic code [107].
Among the aforementioned mechanisms, two of them cause major
epigenetic modifications on DNA or chromatin: DNA methylation and
histone post-translational modification (such as methylation, acetyla-
tion, phosphorylation, and sumoylation) [108]. Here, we briefly explain
these four mechanisms:
1. DNA methylation: Genomic DNA is subjected to modifications that
change the gene expression profile. Methylation mostly causes gene
silencing. This is due to the effect of methylation on DNA binding
proteins (which are sensitive to methylation) or because of the
interaction with histone modifications which affects the access to
promoter sequences [109] (Fig. 1).
Cytosine can be methylated in certain places. Methylation often
takes place where CpG dinucleotides are present. CpG islands are
genomic regions that are abnormally rich in cytosine and guanine [110].
In mammals DNA methylation occurs mostly in CpG dinucleotides
[111], however, in plants, methylation is possible for cytosine residues
found in any location of the genome [111]. In fact, there are three
groups of cytosine methylations in plants that happen in CG, CHG, and
CHH sequences (H =A, C, or T). The full set of all methylations in the
cell is called methylome (Feinberg, 2001).
* Corresponding author at: Department of Plant Breeding and Biotechnology, University of Zabol, Po. Box: 98615-538, Zabol 9861335856, Iran.
** Corresponding author.
E-mail addresses: aliimamjomeh@uoz.ac.ir (A. Emamjomeh), zahiri@modares.ac.ir (J. Zahiri).
Contents lists available at ScienceDirect
Genomics
journal homepage: www.elsevier.com/locate/ygeno
https://doi.org/10.1016/j.ygeno.2021.03.004
Received 22 May 2020; Received in revised form 10 October 2020; Accepted 2 March 2021
Genomics 113 (2021) 1098–1113
1099
2. Histone post-translational modification: nucleosome is the funda-
mental subunit of chromatin, which consists of DNA wrapped around
two pairs of histones H3, H2B, H2A, and H4. Nucleosomes are con-
nected together with the H1 linker. The amino-terminal tails of his-
tone proteins are post-translational modification sites, and any
changes in their physical features affect their interaction with DNA.
In addition to affecting chromatin structure, histone protein modi-
fications affect adaptors, chromatin-modifying enzymes, transcription
factors, repressors, and transcription regulation [47,48,63]. These
changes on the amino-terminal tail of histone H3 and H4 have been well
investigated, and more than 60 types of post-translational modifications
have been identified.
These changes consist of lysine acetylation, lysine and arginine
methylation, serine phosphorylation, lysine ubiquitination, lysine
sumoylation, proline isomerization, and glutamate ADP-ribosylation
[49]. Among these, acetylation and methylation have been further
studied. Each of these post-translational modifications has exclusive
functions and is known as histone signs. These signs are also known as
histone code or epigenetic code [42,97].
3. Chromatin conformation: Chromatin conformation can affect gene
functions by changing the accessibility rate and affinity of regulatory
proteins to their target sites.
Chromatin remodeling is done in two ways:
a) Histone covalent bond changes by specific enzymes such as histone
acetyltransferases, deacetylases, methylases, and kinases.
b) Moving, ejecting or reconstructing nucleosome by chromatin
remodeling complexes which are ATP- dependent [26].
4. Processes mediated by non-coding RNAs: Both small and long non-
coding RNAs have a profound role in gene transcription [43] and
can affect gene regulation via alteration in DNA methylation and
histone modification.
1.1. The importance of studying and identifying epigenetic sites
Nowadays it has become completely clear that epigenetic changes
have a critical role in the sickness and health of humankind [54]. Also,
comprehensive studies have been done on plants under biotic and
abiotic stresses. Stress-induced epigenetic modifications comprise DNA
Fig. 2. Importance and application of predicting epigenetic patterns. The epigenetic inheritance includes DNA methylation, histone post-translational modifications,
chromatin conformation, and regulation by non-coding RNAs. Patterns that can predict these four modifications are used in the prediction and treatment of diseases,
epi-drugs design, and crop improvement against stresses.
Fig. 1. Comparison of the gene expression between two sequences with and
without methylated CpG islands Gene expression (A) Gene silencing (B).
N. Chenarani et al.
Genomics 113 (2021) 1098–1113
1100
methylation, histone diversification, and histone N-tail modification. It
has been shown that these modifications regulate gene expression and
development of plants under stress [9].
Epigenetic modifications are known to correlate with changes in
gene expression. However, quantitative models that accurately predict
the up or down-regulation of gene expressions are currently lacking
[57]. Patterns for epigenetic modification prediction in organisms in
different conditions are discovered by studying epigenetic modifications
and applying mathematical models [6]. The importance of creating
patterns for epigenetic modification prediction is depicted as a flowchart
(Fig. 2).
Recent studies showed that information of epigenetic patterns can be
used to develop models for prediction of differentially expressed tran-
scriptions in complex diseases [57]. Also, a new promising therapeutic
approach has been developed namely epigenetic therapy. Epigenetic
therapy tries to resolve the derangements caused by epigenetic changes
with natural or synthetic drugs [66]. Furthermore, similar approaches
are used in crop improvement strategies, creation of novel epialleles,
and regulation of transgene expressions [90]).
Different experimental methods have emerged for the detection of
epigenetic modification variations and distributions in organisms. These
methods provide information for epigenomic mapping based on the type
of modifications. Considering modifications that are done on genome
sequence (DNA methylation and histone post-translational modifica-
tions) experimental methods could be categorized into two groups,
which are described in the following.
2. Experimental methods for DNA methylation detection
DNA methylation is detected by three major approaches including
bisulfite conversion, methylation-specific restriction enzymes, and
immunoprecipitation of methylated DNA [29]. These approaches can be
divided into specific experimental methods based on whether each of
them uses either array-based or sequencing-based methods (Fig. 3).
The most important experimental method for methylation study is
bisulfite sequencing. The procedure of bisulfite sequencing is considered
as the highest quality level approach in DNA methylation contem-
plates. Since current DNA sequencing technologies cannot distinguish
between methylated and unmethylated cytosine residues, this technique
had to be invented. In this way, deamination of cytosine into uracil is
mediated by the bisulfite treatment of DNA, then the converted residues
are read as thymine, and finally, these are determined by Sanger
sequencing analysis followed by PCR-amplification. But on the other
hand, 5-Methylcytosine (5-mC) residues are resistant to this conversion
and, as a result, will remain read as cytosine. Therefore, the methylated
cytosines can be detected by comparing Sanger sequencing read from an
untreated DNA sample with the same sample treated by bisulfite. In the
era of next-generation sequencing (NGS) innovation, this methodology
can be reached out to DNA methylation investigation over a whole
genome. However, using bisulfite sequencing has its own challenges.
One of these challenges is due to the reduction of genome complexity to
three nucleotides which make post-NGS sequence alignment a more
difficult task. In addition to decreased complexity, bisulfite conversion
leads to more difficult amplification of long fragments because of DNA
fragmentation. This could result in the generation of chimeric reads
(Fig. 4).
Since the estimated level of DNA methylation depends on complete
conversion of non-methylated cytosine residues, we need to make sure
of this process. Thus, it is necessary to control bisulfite reactions, and
also paying attention to the appearance of cytosines in non-CpG sites
after sequencing, that indicates an incomplete conversion. As the
resulting ratio is a snapshot of all DNA isolated from the sample, the
homogeneity of the cell population should be taken into considered by
correct and accurate interpretation of DNA methylation level. A popu-
lation of cells different in terms of methylation (e.g., cancer samples)
will have a dilution effect and thus leverage detected methylation level
[59].
The only difference between Whole genome bisulfite sequencing
(WGBS) and whole-genome sequencing (WGS) is bisulfite sequencing.
WGBS is the most comprehensive next generation sequencing for DNA
methylation profiling. The only limitations are the cost and difficulties
in the analysis of NGS data. As mentioned earlier, non-methylated
Fig. 3. Principles and experimental methods for DNA methylation detection. Combination of these types of experimental approaches for DNA methylation detection
(array-based and sequencing-based) made a large assortment of techniques for DNA methylation.
N. Chenarani et al.
Genomics 113 (2021) 1098–1113
1101
cytosines are converted to thymines after bisulfite treatment, and the
DNA composed of just three bases is so difficult to assemble. The need
for large amounts of DNA, is another limitation of using this technique
which has existed until recently, but the modification of the protocol
which postponed the adaptor ligation step till after bisulfite treatment,
allowed WGBS to perform routinely from ~30 ng of DNA (even in some
cases, only about 125 pg) [112].
Reduced representation bisulfite sequencing (RRBS) alleviates both
limitations of WGBS, by interrogating only a fraction of the genome
[12,113,114,115].
In RRBS, enrichment of CpG-rich regions is achieved by isolation of
short fragments after MspI digestion which recognizes CCGG sites and
cuts both unmethylated and methylated residues. This technique en-
sures isolation of about 85% of CpG islands in the human genome. After
that, similar to WGBS, bisulfite conversion and library preparation are
performed.
About 1
μ
g of DNA is required for RRBS. However, only 100 ng of
DNA can be enough for performing RRBS, but it must be sufficiently pure
for successful MspI digestion. There are several problems with Amplifi-
cation of bisulfite treatment of DNA [115].
Enriching for regions of interest or CpG-containing genomic regions
could be performed before NGS. Such enrichment is achieved by hy-
bridization with immobilized oligonucleotides and can be done before
bisulfite conversion. Such kits are commercially available (e.g., Agi-
lents SureSelectXT Methyl-Seq). After bisulfite conversion using the
SeqCap Epi System from Roche, hybridization for enrichment can be
done. There are several versions of these kits which allow us to perform
enrichment for a small fraction of the genome that includes only the
regions of interest. The name of this method is targeted bisulfite
sequencing. Both of these kits have a good correlation with RRBS, in
addition to their strengths in covering more CpG-rich regions [116].
Direct detection of modified bases is a preferred technique due to all
Fig. 4. Standards of methylation investigation utilizing bisulfite genomic sequencing. After treatment with sodium bisulfite, unmethylated cytosine deposits are
changed over to uracil through 5- methyl cytosine (5mC) stays unaffected. After PCR amplification, uracil bases are changed over to thymine. DNA methylation status
can be dictated via direct PCR sequencing [60].
N. Chenarani et al.
Genomics 113 (2021) 1098–1113
1102
of the disadvantages of bisulfite modifications. A method has already
been developed by Pacific Biosciences Company which allow direct
detection of methylated nucleotides through monitoring the kinetics of
polymerase during single-molecule sequencing [117].
Although, this technology is currently only used for DNA methyl-
ation analysis of the bacterial. Recently, advances in the development of
nanopore-based single-molecule real-time (SMRT) technology have
gained momentum. SMART can detect modified residues directly [118,
119]. The next generation of instruments even better in specificity and
sensitivity, will be brought by the commercialization of these new
discoveries.
For the first time, a genomic sequencing method for the identifica-
tion of methylated cytosine in genomic DNA was developed. This
method is based on the changes that bisulfite causes on cytosine which is
known as Bisulfite Sequencing. Bisulfite changes cytosine to uracil, even
though it does not affect methylated cytosine residues in the sequence.
After these changes, the query sequence connected to a pair of exclusive
primers is amplified using PCR amplification. All the uracil and thymine
residues are amplified as thymine. Moreover, 5-methylcytosine residues
that are not under the effect of bisulfite are amplified as cytosine. After
the polymerization cycle, the product will directly enter sequencing
[22].
3. Experimental methods for profiling post-translational histone
modifications
The most widely utilized strategy for chromatin analysis is the
chromatin immunoprecipitation (ChIP) test [53]. In this procedure,
chromatin (the substance of the cell nucleus, including DNA and related
proteins) protein, and DNA are temporarily fortified together and the
chromatin is fragmented by mechanical shear into several short DNA-
protein parts.
Select sections containing proteins of intrigue are then precipitated
from the supernatant by bead-immobilized antibodies. Proteins are then
taken out from the gathered pieces and the DNA is purified and
sequenced. The outcome is a rundown of short DNA sections that had
been attached by a particular protein of interest. While ChIP is valuable,
the strategy can detect and analyze just a single histone modification at a
time. Traditional strategies to investigate chromatin are restricted in
their capability to give enough data at a single cell level data for un-
common cell or cell differentiation research [13,53,73].
Recent studies in micro/nanotechnology open a chance for the
investigation of chromatin at the single-cell level via mechanisms that
are able to exact control improved taking care of, and higher throughput
studies. In this paragraph, the current works about chromatin in micro/
nanofluidics channels and its potential future applications will be dis-
cussed. We separate this section into two sub-section: section one
comprises of strategies that expect to enhance ChIP capability and are
chiefly microfluidic, and the subsequent part studies non-ChIP tech-
niques included third-generation nanofluidic platforms. Microfluidic
platforms can incredibly improve antibody affinity and immunopre-
cipitation times in ChIP tests. Oh et al. use microfluidics to exhibit an
effective methodology for ChIP tests [75]. Advances in micro- and
nanofabrication have prompted the development of gadgets that can
utilize new physical rules to monitor single cells and molecules to be
investigated with high accuracy [81,82].
In this methodology, a cluster of nanostructures is utilized to sepa-
rate the DNA molecule for optical monitoring. Nano-scale objects,
smaller than the brightening frequency, reduce the optical optical-
examination volume with the end goal that enzymatic joining of indi-
vidual nucleotides by a single DNA polymerase can be monitored in real-
time. [55].
A new approach to unfold a DNA or chromatin residue is to utilize
nano-confinement [120] in a channel with a width and height smaller
than the DNA determination length (~50 nm in physiological condi-
tions) [94].
Microfluidic automation has been used to increase the speed of ChIP
experiment and reduce required the sample and consumed time.
((Fig. 5) [24,99]. These microfluidic ChIP devices contain an organiza-
tion of tiny valves and chambers that are manufactured in silicone
elastic pattern. [95].
The coordinated valves allow little sample volumes to be con-
trollably injected and proceed in scaled-down chambers, which can
improve antibodytarget associations and diminish incubation time
Instead of manual pipetting and multistep conventions with sufficient
sample misfortunes, small interconnecting chambers and channels
Fig. 5. The coiled chromatin molecules in a nanochannel which chromatin molecule is extended. [69,100].
N. Chenarani et al.