Predicting methylation status of human DNA sequences by pseudo-trinucleotide composition.

DNA methylation plays a key role in the regulation of gene expression. The most common type of DNA modification consists of the methylation of cytosine in the CpG dinucleotide. The detections of DNA methylation have been determined mostly by experimental methods; however, these methods were time-consuming, expensive, and difficult to meet the requirements of modern large-scale sequencing technology. Accordingly, it is necessary to develop automatic and reliable prediction methods for DNA methylation. In this study, the pseudo-trinucleotide composition was proposed, and a novel method was developed by support vector machine (SVM) with the pseudo-trinucleotide composition as input parameter to represent DNA sequence for DNA methylation prediction. The model was evaluated on two datasets, including a dataset of Rollins (dataset_1) and a dataset collected healthy human records from the MethDB database (dataset_2). For dataset_1, the Matthews correlation coefficient (MCC) and accuracy (ACC) by jackknife validation were 0.8051 and 0.6098, respectively. For dataset_2, the MCC and ACC were 0.8500 and 0.7203, respectively. The good prediction results reveal that the pseudo-trinucleotide composition is an effective representation method for DNA sequence and plays a very important role in the prediction of DNA function.

[1]  P. Jones,et al.  Rapid quantitation of methylation differences at specific sites using methylation-sensitive single nucleotide primer extension (Ms-SNuPE). , 1997, Nucleic acids research.

[2]  Daiya Takai,et al.  Comprehensive analysis of CpG islands in human chromosomes 21 and 22 , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[3]  W. Doerfler,et al.  DNA methylation and gene activity. , 1983, Annual review of biochemistry.

[4]  Maria Strazzullo,et al.  DNA methylation 40 years later: Its role in human health and disease , 2005, Journal of cellular physiology.

[5]  Éric Renault,et al.  MethDB - a public database for DNA methylation data , 2001, Nucleic Acids Res..

[6]  Liguo Song,et al.  Specific method for the determination of genomic DNA methylation by liquid chromatography-electrospray ionization tandem mass spectrometry. , 2005, Analytical chemistry.

[7]  Kuo-Chen Chou,et al.  Prediction of enzyme family classes. , 2003, Journal of proteome research.

[8]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[9]  W. Reik,et al.  Epigenetic Reprogramming in Mammalian Development , 2001, Science.

[10]  P. Jones,et al.  DNA methylation and cancer. , 1993, EXS.

[11]  Rachel Jones,et al.  Behavioural genetics: Worms gang up on bacteria , 2002, Nature Reviews Genetics.

[12]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[13]  Michael Q. Zhang,et al.  Large-scale structure of genomic methylation patterns. , 2005, Genome research.

[14]  M. Frommer,et al.  CpG islands in vertebrate genomes. , 1987, Journal of molecular biology.

[15]  K. Chou A novel approach to predicting protein structural classes in a (20–1)‐D amino acid composition space , 1995, Proteins.

[16]  Manoj Bhasin,et al.  Prediction of methylated CpGs in DNA sequences using a support vector machine , 2005, FEBS letters.

[17]  Johanne Tremblay,et al.  Genes of aging. , 2003, Metabolism: clinical and experimental.

[18]  Zhiyong Zhang,et al.  Age-dependent DNA methylation changes in the ITGAL (CD11a) promoter , 2002, Mechanisms of Ageing and Development.

[19]  Michael Q. Zhang,et al.  Bioinformatics Original Paper Predicting Methylation Status of Cpg Islands in the Human Brain , 2022 .

[20]  Kazuhito Tanabe,et al.  Site-specific discrimination of Cytosine and 5-methylcytosine in duplex DNA by Peptide nucleic acids. , 2002, Journal of the American Chemical Society.

[21]  Peter L Molloy,et al.  DNA methylation: Bisulphite modification and analysis , 2006, Nature Protocols.

[22]  A. Wolffe,et al.  Epigenetics: regulation through repression. , 1999, Science.

[23]  Michael Q. Zhang,et al.  Computational prediction of methylation status in human genomic sequences. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[24]  N. Ahuja,et al.  Aging, methylation and cancer. , 2000, Histology and histopathology.

[25]  J. Herman,et al.  Methylation-specific PCR: a novel PCR assay for methylation status of CpG islands. , 1996, Proceedings of the National Academy of Sciences of the United States of America.

[26]  Yusaku Tagashira,et al.  Stabilities of nearest‐neighbor doublets in double‐helical DNA determined by fitting calculated melting profiles to observed profiles , 1981 .

[27]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[28]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[29]  J. Šponer,et al.  Intramolecular flexibility of DNA bases in adenine–thymine and guanine–cytosine Watson–Crick base pairs , 1999 .

[30]  Peter A. Jones,et al.  The fundamental role of epigenetic events in cancer , 2002, Nature Reviews Genetics.

[31]  Lingli Han,et al.  Fluorescent conjugated polyelectrolyte as an indicator for convenient detection of DNA methylation. , 2008, Journal of the American Chemical Society.

[32]  A. Jeltsch,et al.  Biochemistry and biology of mammalian DNA methyltransferases , 2004, Cellular and Molecular Life Sciences CMLS.

[33]  Zhiliang Li,et al.  Scores of generalized base properties for quantitative sequence-activity modelings for E. coli promoters based on support vector machine. , 2007, Journal of molecular graphics & modelling.

[34]  Adrian Bird,et al.  The essentials of DNA methylation , 1992, Cell.

[35]  M. Monajjemi,et al.  Simulation of DNA bases in water: Comparison of the Monte Carlo algorithm with molecular mechanics force fields , 2006, Biochemistry (Moscow).

[36]  E. Li Chromatin modification and epigenetic reprogramming in mammalian development , 2002, Nature Reviews Genetics.

[37]  K. Chou Prediction of protein cellular attributes using pseudo‐amino acid composition , 2001, Proteins.