Phenotype prediction based on genome-wide DNA methylation data

BackgroundDNA methylation (DNAm) has important regulatory roles in many biological processes and diseases. It is the only epigenetic mark with a clear mechanism of mitotic inheritance and the only one easily available on a genome scale. Aberrant cytosine-phosphate-guanine (CpG) methylation has been discussed in the context of disease aetiology, especially cancer. CpG hypermethylation of promoter regions is often associated with silencing of tumour suppressor genes and hypomethylation with activation of oncogenes.Supervised principal component analysis (SPCA) is a popular machine learning method. However, in a recent application to phenotype prediction from DNAm data SPCA was inferior to the specific method EVORA.ResultsWe present Model-Selection-SPCA (MS-SPCA), an enhanced version of SPCA. MS-SPCA applies several models that perform well in the training data to the test data and selects the very best models for final prediction based on parameters of the test data.We have applied MS-SPCA for phenotype prediction from genome-wide DNAm data. CpGs used for prediction are selected based on the quantification of three features of their methylation (average methylation difference, methylation variation difference and methylation-age-correlation). We analysed four independent case–control datasets that correspond to different stages of cervical cancer: (i) cases currently cytologically normal, but will later develop neoplastic transformations, (ii, iii) cases showing neoplastic transformations and (iv) cases with confirmed cancer. The first dataset was split into several smaller case–control datasets (samples either Human Papilloma Virus (HPV) positive or negative). We demonstrate that cytology normal HPV+ and HPV- samples contain DNAm patterns which are associated with later neoplastic transformations. We present evidence that DNAm patterns exist in cytology normal HPV- samples that (i) predispose to neoplastic transformations after HPV infection and (ii) predispose to HPV infection itself. MS-SPCA performs significantly better than EVORA.ConclusionsMS-SPCA can be applied to many classification problems. Additional improvements could include usage of more than one principal component (PC), with automatic selection of the optimal number of PCs. We expect that MS-SPCA will be useful for analysing recent larger DNAm data to predict future neoplastic transformations.

[1]  C. Bock Analysing and interpreting DNA methylation data , 2012, Nature Reviews Genetics.

[2]  A. Feinberg,et al.  Hypomethylation distinguishes genes of some human cancers from their normal counterparts , 1983, Nature.

[3]  C. Moskaluk,et al.  Integrated, Genome-Wide Screening for Hypomethylated Oncogenes in Salivary Gland Adenoid Cystic Carcinoma , 2011, Clinical Cancer Research.

[4]  Edward Gabrielson,et al.  Hypermethylation of the GATA Genes in Lung Cancer , 2004, Clinical Cancer Research.

[5]  J. Mathers,et al.  Diet induced epigenetic changes and their implications for health , 2011, Acta physiologica.

[6]  P. Visscher,et al.  Common SNPs explain a large proportion of heritability for human height , 2011 .

[7]  American society for colposcopy and cervical pathology. , 1997, Journal of lower genital tract disease.

[8]  Alex E. Lash,et al.  Gene Expression Omnibus: NCBI gene expression and hybridization array data repository , 2002, Nucleic Acids Res..

[9]  Wolfgang Wagner,et al.  Age-dependent DNA methylation of genes that are suppressed in stem cells is a hallmark of cancer. , 2010, Genome research.

[10]  M. Schiffman,et al.  American Cancer Society, American Society for Colposcopy and Cervical Pathology, and American Society for Clinical Pathology screening guidelines for the prevention and early detection of cervical cancer , 2012, CA: a cancer journal for clinicians.

[11]  Gangning Liang,et al.  DNA methylation screening identifies driver epigenetic events of cancer cell survival. , 2012, Cancer cell.

[12]  R. Butzow,et al.  Transcription Factors GATA-4 and GATA-6, and their Potential Downstream Effectors in Ovarian Germ Cell Tumors , 2005, Tumor Biology.

[13]  G. Gibson Hints of hidden heritability in GWAS , 2010, Nature Genetics.

[14]  Howard Slomko,et al.  Minireview: Epigenetics of obesity and diabetes in humans. , 2012, Endocrinology.

[15]  A. Godwin,et al.  Loss of GATA4 and GATA6 Expression Specifies Ovarian Cancer Histological Subtypes and Precedes Neoplastic Transformation of Ovarian Surface Epithelia , 2009, PloS one.

[16]  R. Tibshirani,et al.  Diagnosis of multiple cancer types by shrunken centroids of gene expression , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[17]  P. D’Eustachio,et al.  Cloning of neurotrimin defines a new subfamily of differentially expressed neural cell adhesion molecules , 1995, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[18]  Tao Wang,et al.  A Powerful Statistical Method for Identifying Differentially Methylated Markers in Complex Diseases , 2012, Pacific Symposium on Biocomputing.

[19]  Jeffrey T Leek,et al.  A general framework for multiple testing dependence , 2008, Proceedings of the National Academy of Sciences.

[20]  Andrew P. Feinberg,et al.  Cancer as a dysregulated epigenome allowing cellular growth advantage at the expense of the host , 2013, Nature Reviews Cancer.

[21]  Gajendra P. S. Raghava,et al.  CCDB: a curated database of genes involved in cervix cancer , 2010, Nucleic Acids Res..

[22]  Martin J. Aryee,et al.  Personalized Epigenomic Signatures That Are Stable Over Time and Covary with Body Mass Index , 2010, Science Translational Medicine.

[23]  D. Balding,et al.  Epigenome-wide association studies for common human diseases , 2011, Nature Reviews Genetics.

[24]  R. Tibshirani,et al.  Semi-Supervised Methods to Predict Patient Survival from Gene Expression Data , 2004, PLoS biology.

[25]  Andrew E. Teschendorff,et al.  Differential variability improves the identification of cancer risk markers in DNA methylation studies profiling precursor cancer lesions , 2012, Bioinform..

[26]  B. Stewart,et al.  World Cancer Report , 2003 .

[27]  John D. Storey,et al.  Statistical significance for genomewide studies , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[28]  Megan F. Cole,et al.  Control of Developmental Regulators by Polycomb in Human Embryonic Stem Cells , 2006, Cell.

[29]  K. Gunderson,et al.  Genome-wide DNA methylation profiling using Infinium® assay. , 2009, Epigenomics.

[30]  Jian-Bing Fan,et al.  Genome‐wide DNA methylation profiling , 2010, Wiley interdisciplinary reviews. Systems biology and medicine.

[31]  B. Stewart,et al.  World cancer report 2014. , 2014 .

[32]  Doron Lancet,et al.  MOPED: Model Organism Protein Expression Database , 2011, Nucleic Acids Res..

[33]  J. Tchinda,et al.  Recurrent fusion of TMPRSS2 and ETS transcription factor genes in prostate cancer. , 2006, Science.

[34]  A. Bird DNA methylation patterns and epigenetic memory. , 2002, Genes & development.

[35]  H. Kitchener,et al.  Epigenetic variability in cells of normal cytology is associated with the risk of future morphological transformation , 2012, Genome Medicine.

[36]  Peter A. Jones,et al.  A decade of exploring the cancer epigenome — biological and translational implications , 2011, Nature Reviews Cancer.

[37]  J. Peto,et al.  Human papillomavirus is a necessary cause of invasive cervical cancer worldwide , 1999, The Journal of pathology.

[38]  B. Horsthemke,et al.  Epigenetic changes may contribute to the formation and spontaneous regression of retinoblastoma , 1989, Human Genetics.

[39]  P. Allavena,et al.  Identification of Biologically Active Chemokine Isoforms from Ascitic Fluid and Elevated Levels of CCL18/Pulmonary and Activation-regulated Chemokine in Ovarian Carcinoma* , 2002, The Journal of Biological Chemistry.

[40]  C. von Mering,et al.  PaxDb, a Database of Protein Abundance Averages Across All Three Domains of Life , 2012, Molecular & Cellular Proteomics.

[41]  J. Cuzick,et al.  A DNA methylation classifier of cervical precancer based on human papillomavirus and human genes , 2014, International journal of cancer.

[42]  Li Yu,et al.  [DNA methylation and cancer]. , 2005, Zhonghua nei ke za zhi.

[43]  M. Ewen,et al.  CCL18 from tumor-associated macrophages promotes breast cancer metastasis via PITPNM3. , 2011, Cancer cell.

[44]  C. Caslini,et al.  Histone modifications silence the GATA transcription factor genes in ovarian cancer , 2006, Oncogene.

[45]  Matthias Mann,et al.  Analysis of High Accuracy, Quantitative Proteomics Data in the MaxQB Database , 2012, Molecular & Cellular Proteomics.