Công cụ tin sinh học (Bioinformatic tools) cho DNA methylation và histone modification: Tổng quan

Genomics 113 (2021) 1098–1113

Available online 4 March 2021

Review

Bioinformatic tools for DNA methylation and histone modification:

A survey

Nasibeh Chenarani

, Abbasali Emamjomeh

, Abdollah Allahverdi

, SeyedAli Mirmostafa

Mohammad Hossein Afsharinia

, Javad Zahiri

Department of Plant Breeding and Biotechnology (PBB), Faculty of Agriculture, University of Zabol, Zabol, Iran

Laboratory of Computational Biotechnology and Bioinformatics (CBB), Department of Bioinformatics, Faculty of Basic Sciences, University of Zabol, Zabol, Iran

Department of Biophysics, Faculty of Biological Sciences, Tarbiat Modares University, Tehran, Iran

Bioinformatics and Computational Omics Lab (BioCOOL), Department of Biophysics, Faculty of Biological Sciences, Tarbiat Modares University, Tehran, Iran

Department of Neuroscience, University of California, San Diego, USA

ARTICLE INFO

Keywords:

Epigenetics

Database

Prediction

Algorithm

Tools

ABSTRACT

Epigenetic inheritance occurs due to different mechanisms such as chromatin and histone modifications, DNA

methylation and processes mediated by non-coding RNAs. It leads to changes in gene expressions and the

emergence of new traits in different organisms in many diseases such as cancer. Recent advances in experimental

methods led to the identification of epigenetic target sites in various organisms. Computational approaches have

enabled us to analyze mass data produced by these methods. Next-generation sequencing (NGS) methods have

been broadly used to identify these target sites and their patterns. By using these patterns, the emergence of

diseases could be prognosticated. In this study, target site prediction tools for two major epigenetic mechanisms

comprising histone modification and DNA methylation are reviewed. Publicly accessible databases are reviewed

as well. Some suggestions regarding the state-of-the-art methods and databases have been made, including

examining patterns of epigenetic changes that are important in epigenotypes detection.

1. Introduction

Epigenetic inheritance was introduced for the first time in 1940s as

the interaction between gene and environment which leads to the

emergence of new phenotypes in organisms [105]. Epigenetics is the

study of heritable phenotype changes that do not involve alteration in

DNA sequence [106].

Epigenetic changes are made under the influence of different

mechanisms such as DNA methylation, histone modification, chromatin

organization, and regulatory processes mediated by non-coding RNAs.

In fact, the importance of identifying DNA methylation in epigenetic

code is similar to the importance of identifying Expressed Sequence Tags

(ESTs) which provided the first outlook to genetic code [107].

Among the aforementioned mechanisms, two of them cause major

epigenetic modifications on DNA or chromatin: DNA methylation and

histone post-translational modification (such as methylation, acetyla-

tion, phosphorylation, and sumoylation) [108]. Here, we briefly explain

these four mechanisms:

1. DNA methylation: Genomic DNA is subjected to modifications that

change the gene expression profile. Methylation mostly causes gene

silencing. This is due to the effect of methylation on DNA binding

proteins (which are sensitive to methylation) or because of the

interaction with histone modifications which affects the access to

promoter sequences [109] (Fig. 1).

Cytosine can be methylated in certain places. Methylation often

takes place where CpG dinucleotides are present. CpG islands are

genomic regions that are abnormally rich in cytosine and guanine [110].

In mammal’s DNA methylation occurs mostly in CpG dinucleotides

[111], however, in plants, methylation is possible for cytosine residues

found in any location of the genome [111]. In fact, there are three

groups of cytosine methylations in plants that happen in CG, CHG, and

CHH sequences (H =A, C, or T). The full set of all methylations in the

cell is called methylome (Feinberg, 2001).

* Corresponding author at: Department of Plant Breeding and Biotechnology, University of Zabol, Po. Box: 98615-538, Zabol 9861335856, Iran.

** Corresponding author.

E-mail addresses: aliimamjomeh@uoz.ac.ir (A. Emamjomeh), zahiri@modares.ac.ir (J. Zahiri).

Contents lists available at ScienceDirect

Genomics

journal homepage: www.elsevier.com/locate/ygeno

https://doi.org/10.1016/j.ygeno.2021.03.004

Received 22 May 2020; Received in revised form 10 October 2020; Accepted 2 March 2021

Genomics 113 (2021) 1098–1113

1099

2. Histone post-translational modification: nucleosome is the funda-

mental subunit of chromatin, which consists of DNA wrapped around

two pairs of histones H3, H2B, H2A, and H4. Nucleosomes are con-

nected together with the H1 linker. The amino-terminal tails of his-

tone proteins are post-translational modification sites, and any

changes in their physical features affect their interaction with DNA.

In addition to affecting chromatin structure, histone protein modi-

fications affect adaptors, chromatin-modifying enzymes, transcription

factors, repressors, and transcription regulation [47,48,63]. These

changes on the amino-terminal tail of histone H3 and H4 have been well

investigated, and more than 60 types of post-translational modifications

have been identified.

These changes consist of lysine acetylation, lysine and arginine

methylation, serine phosphorylation, lysine ubiquitination, lysine

sumoylation, proline isomerization, and glutamate ADP-ribosylation

[49]. Among these, acetylation and methylation have been further

studied. Each of these post-translational modifications has exclusive

functions and is known as histone “signs”. These signs are also known as

histone code or epigenetic code [42,97].

3. Chromatin conformation: Chromatin conformation can affect gene

functions by changing the accessibility rate and affinity of regulatory

proteins to their target sites.

Chromatin remodeling is done in two ways:

a) Histone covalent bond changes by specific enzymes such as histone

acetyltransferases, deacetylases, methylases, and kinases.

b) Moving, ejecting or reconstructing nucleosome by chromatin

remodeling complexes which are ATP- dependent [26].

4. Processes mediated by non-coding RNAs: Both small and long non-

coding RNAs have a profound role in gene transcription [43] and

can affect gene regulation via alteration in DNA methylation and

histone modification.

1.1. The importance of studying and identifying epigenetic sites

Nowadays it has become completely clear that epigenetic changes

have a critical role in the sickness and health of humankind [54]. Also,

comprehensive studies have been done on plants under biotic and

abiotic stresses. Stress-induced epigenetic modifications comprise DNA

Fig. 2. Importance and application of predicting epigenetic patterns. The epigenetic inheritance includes DNA methylation, histone post-translational modifications,

chromatin conformation, and regulation by non-coding RNAs. Patterns that can predict these four modifications are used in the prediction and treatment of diseases,

epi-drugs design, and crop improvement against stresses.

Fig. 1. Comparison of the gene expression between two sequences with and

without methylated CpG islands Gene expression (A) Gene silencing (B).

N. Chenarani et al.

Genomics 113 (2021) 1098–1113

1100

methylation, histone diversification, and histone N-tail modification. It

has been shown that these modifications regulate gene expression and

development of plants under stress [9].

Epigenetic modifications are known to correlate with changes in

gene expression. However, quantitative models that accurately predict

the up or down-regulation of gene expressions are currently lacking

[57]. Patterns for epigenetic modification prediction in organisms in

different conditions are discovered by studying epigenetic modifications

and applying mathematical models [6]. The importance of creating

patterns for epigenetic modification prediction is depicted as a flowchart

(Fig. 2).

Recent studies showed that information of epigenetic patterns can be

used to develop models for prediction of differentially expressed tran-

scriptions in complex diseases [57]. Also, a new promising therapeutic

approach has been developed namely epigenetic therapy. Epigenetic

therapy tries to resolve the derangements caused by epigenetic changes

with natural or synthetic drugs [66]. Furthermore, similar approaches

are used in crop improvement strategies, creation of novel epialleles,

and regulation of transgene expressions [90]).

Different experimental methods have emerged for the detection of

epigenetic modification variations and distributions in organisms. These

methods provide information for epigenomic mapping based on the type

of modifications. Considering modifications that are done on genome

sequence (DNA methylation and histone post-translational modifica-

tions) experimental methods could be categorized into two groups,

which are described in the following.

2. Experimental methods for DNA methylation detection

DNA methylation is detected by three major approaches including

bisulfite conversion, methylation-specific restriction enzymes, and

immunoprecipitation of methylated DNA [29]. These approaches can be

divided into specific experimental methods based on whether each of

them uses either array-based or sequencing-based methods (Fig. 3).

The most important experimental method for methylation study is

bisulfite sequencing. The procedure of bisulfite sequencing is considered

as the “highest quality level” approach in DNA methylation contem-

plates. Since current DNA sequencing technologies cannot distinguish

between methylated and unmethylated cytosine residues, this technique

had to be invented. In this way, deamination of cytosine into uracil is

mediated by the bisulfite treatment of DNA, then the converted residues

are read as thymine, and finally, these are determined by Sanger

sequencing analysis followed by PCR-amplification. But on the other

hand, 5-Methylcytosine (5-mC) residues are resistant to this conversion

and, as a result, will remain read as cytosine. Therefore, the methylated

cytosines can be detected by comparing Sanger sequencing read from an

untreated DNA sample with the same sample treated by bisulfite. In the

era of next-generation sequencing (NGS) innovation, this methodology

can be reached out to DNA methylation investigation over a whole

genome. However, using bisulfite sequencing has its own challenges.

One of these challenges is due to the reduction of genome complexity to

three nucleotides which make post-NGS sequence alignment a more

difficult task. In addition to decreased complexity, bisulfite conversion

leads to more difficult amplification of long fragments because of DNA

fragmentation. This could result in the generation of chimeric reads

(Fig. 4).

Since the estimated level of DNA methylation depends on complete

conversion of non-methylated cytosine residues, we need to make sure

of this process. Thus, it is necessary to control bisulfite reactions, and

also paying attention to the appearance of cytosines in non-CpG sites

after sequencing, that indicates an incomplete conversion. As the

resulting ratio is a snapshot of all DNA isolated from the sample, the

homogeneity of the cell population should be taken into considered by

correct and accurate interpretation of DNA methylation level. A popu-

lation of cells different in terms of methylation (e.g., cancer samples)

will have a dilution effect and thus leverage detected methylation level

[59].

The only difference between Whole genome bisulfite sequencing

(WGBS) and whole-genome sequencing (WGS) is “bisulfite sequencing”.

WGBS is the most comprehensive next generation sequencing for DNA

methylation profiling. The only limitations are the cost and difficulties

in the analysis of NGS data. As mentioned earlier, non-methylated

Fig. 3. Principles and experimental methods for DNA methylation detection. Combination of these types of experimental approaches for DNA methylation detection

(array-based and sequencing-based) made a large assortment of techniques for DNA methylation.

N. Chenarani et al.

Genomics 113 (2021) 1098–1113

1101

cytosines are converted to thymines after bisulfite treatment, and the

DNA composed of just three bases is so difficult to assemble. The need

for large amounts of DNA, is another limitation of using this technique

which has existed until recently, but the modification of the protocol

which postponed the adaptor ligation step till after bisulfite treatment,

allowed WGBS to perform routinely from ~30 ng of DNA (even in some

cases, only about 125 pg) [112].

Reduced representation bisulfite sequencing (RRBS) alleviates both

limitations of WGBS, by interrogating only a fraction of the genome

[12,113,114,115].

In RRBS, enrichment of CpG-rich regions is achieved by isolation of

short fragments after MspI digestion which recognizes CCGG sites and

cuts both unmethylated and methylated residues. This technique en-

sures isolation of about 85% of CpG islands in the human genome. After

that, similar to WGBS, bisulfite conversion and library preparation are

performed.

About 1

g of DNA is required for RRBS. However, only 100 ng of

DNA can be enough for performing RRBS, but it must be sufficiently pure

for successful MspI digestion. There are several problems with Amplifi-

cation of bisulfite treatment of DNA [115].

Enriching for regions of interest or CpG-containing genomic regions

could be performed before NGS. Such enrichment is achieved by hy-

bridization with immobilized oligonucleotides and can be done before

bisulfite conversion. Such kits are commercially available (e.g., Agi-

lent’s SureSelectXT Methyl-Seq). After bisulfite conversion using the

SeqCap Epi System from Roche, hybridization for enrichment can be

done. There are several versions of these kits which allow us to perform

enrichment for a small fraction of the genome that includes only the

regions of interest. The name of this method is targeted bisulfite

sequencing. Both of these kits have a good correlation with RRBS, in

addition to their strengths in covering more CpG-rich regions [116].

Direct detection of modified bases is a preferred technique due to all

Fig. 4. Standards of methylation investigation utilizing bisulfite genomic sequencing. After treatment with sodium bisulfite, unmethylated cytosine deposits are

changed over to uracil through 5- methyl cytosine (5mC) stays unaffected. After PCR amplification, uracil bases are changed over to thymine. DNA methylation status

can be dictated via direct PCR sequencing [60].

N. Chenarani et al.

Genomics 113 (2021) 1098–1113

1102

of the disadvantages of bisulfite modifications. A method has already

been developed by Pacific Biosciences Company which allow direct

detection of methylated nucleotides through monitoring the kinetics of

polymerase during single-molecule sequencing [117].

Although, this technology is currently only used for DNA methyl-

ation analysis of the bacterial. Recently, advances in the development of

nanopore-based single-molecule real-time (SMRT) technology have

gained momentum. SMART can detect modified residues directly [118,

119]. The next generation of instruments even better in specificity and

sensitivity, will be brought by the commercialization of these new

discoveries.

For the first time, a genomic sequencing method for the identifica-

tion of methylated cytosine in genomic DNA was developed. This

method is based on the changes that bisulfite causes on cytosine which is

known as Bisulfite Sequencing. Bisulfite changes cytosine to uracil, even

though it does not affect methylated cytosine residues in the sequence.

After these changes, the query sequence connected to a pair of exclusive

primers is amplified using PCR amplification. All the uracil and thymine

residues are amplified as thymine. Moreover, 5-methylcytosine residues

that are not under the effect of bisulfite are amplified as cytosine. After

the polymerization cycle, the product will directly enter sequencing

[22].

3. Experimental methods for profiling post-translational histone

modifications

The most widely utilized strategy for chromatin analysis is the

chromatin immunoprecipitation (ChIP) test [53]. In this procedure,

chromatin (the substance of the cell nucleus, including DNA and related

proteins) protein, and DNA are temporarily fortified together and the

chromatin is fragmented by mechanical shear into several short DNA-

protein parts.

Select sections containing proteins of intrigue are then precipitated

from the supernatant by bead-immobilized antibodies. Proteins are then

taken out from the gathered pieces and the DNA is purified and

sequenced. The outcome is a rundown of short DNA sections that had

been attached by a particular protein of interest. While ChIP is valuable,

the strategy can detect and analyze just a single histone modification at a

time. Traditional strategies to investigate chromatin are restricted in

their capability to give enough data at a single cell level data for un-

common cell or cell differentiation research [13,53,73].

Recent studies in micro/nanotechnology open a chance for the

investigation of chromatin at the single-cell level via mechanisms that

are able to exact control improved taking care of, and higher throughput

studies. In this paragraph, the current works about chromatin in micro/

nanofluidics channels and its potential future applications will be dis-

cussed. We separate this section into two sub-section: section one

comprises of strategies that expect to enhance ChIP capability and are

chiefly microfluidic, and the subsequent part studies non-ChIP tech-

niques included third-generation nanofluidic platforms. Microfluidic

platforms can incredibly improve antibody affinity and immunopre-

cipitation times in ChIP tests. Oh et al. use microfluidics to exhibit an

effective methodology for ChIP tests [75]. Advances in micro- and

nanofabrication have prompted the development of gadgets that can

utilize new physical rules to monitor single cells and molecules to be

investigated with high accuracy [81,82].

In this methodology, a cluster of nanostructures is utilized to sepa-

rate the DNA molecule for optical monitoring. Nano-scale objects,

smaller than the brightening frequency, reduce the optical optical-

examination volume with the end goal that enzymatic joining of indi-

vidual nucleotides by a single DNA polymerase can be monitored in real-

time. [55].

A new approach to unfold a DNA or chromatin residue is to utilize

nano-confinement [120] in a channel with a width and height smaller

than the DNA determination length (~50 nm in physiological condi-

tions) [94].

Microfluidic automation has been used to increase the speed of ChIP

experiment and reduce required the sample and consumed time.

((Fig. 5) [24,99]. These microfluidic ChIP devices contain an organiza-

tion of tiny valves and chambers that are manufactured in silicone

elastic pattern. [95].

The coordinated valves allow little sample volumes to be con-

trollably injected and proceed in scaled-down chambers, which can

improve antibody–target associations and diminish incubation time

Instead of manual pipetting and multistep conventions with sufficient

sample misfortunes, small interconnecting chambers and channels

Fig. 5. The coiled chromatin molecules in a nanochannel which chromatin molecule is extended. [69,100].

N. Chenarani et al.

Bioinformatic tools for DNA methylation and histone modification: A survey

Bài viết tổng quan công cụ tin sinh học nghiên cứu methyl hóa DNA, biến đổi histone (di truyền biểu sinh). Phân tích phương pháp thực nghiệm, cơ sở dữ liệu liên quan.

Chủ đề:

Tài liệu liên quan

Tài liêu mới

AI tóm tắt

Giới thiệu tài liệu

Đối tượng sử dụng

Từ khoá chính

Nội dung tóm tắt

Hỗ trợ

Phương thức thanh toán

Theo dõi chúng tôi