pETM: a penalized Exponential Tilt Model for analysis of correlated high‐dimensional DNA methylation data

Motivation: DNA methylation plays an important role in many biological processes and cancer progression. Recent studies have found that there are also differences in methylation variations in different groups other than differences in methylation means. Several methods have been developed that consider both mean and variance signals in order to improve statistical power of detecting differentially methylated loci. Moreover, as methylation levels of neighboring CpG sites are known to be strongly correlated, methods that incorporate correlations have also been developed. We previously developed a network‐based penalized logistic regression for correlated methylation data, but only focusing on mean signals. We have also developed a generalized exponential tilt model that captures both mean and variance signals but only examining one CpG site at a time. Results: In this article, we proposed a penalized Exponential Tilt Model (pETM) using network‐based regularization that captures both mean and variance signals in DNA methylation data and takes into account the correlations among nearby CpG sites. By combining the strength of the two models we previously developed, we demonstrated the superior power and better performance of the pETM method through simulations and the applications to the 450K DNA methylation array data of the four breast invasive carcinoma cancer subtypes from The Cancer Genome Atlas (TCGA) project. The developed pETM method identifies many cancer‐related methylation loci that were missed by our previously developed method that considers correlations among nearby methylation loci but not variance signals. Availability and Implementation: The R package ‘pETM’ is publicly available through CRAN: http://cran.r‐project.org. Contact : sw2206@columbia.edu Supplementary information: Supplementary data are available at Bioinformatics online.

[1]  S. Nomoto,et al.  Inverse Correlation between Cyclin A1 Hypermethylation and p53 Mutation in Head and Neck Cancer Identified by Reversal of Epigenetic Silencing , 2004, Cancer Research.

[2]  Ronald D. Alvarez,et al.  Stem Cell Pathways Contribute to Clinical Chemoresistance in Ovarian Cancer , 2011, Clinical Cancer Research.

[3]  A. Børresen-Dale,et al.  COMPLEX LANDSCAPES OF SOMATIC REARRANGEMENT IN HUMAN BREAST CANCER GENOMES , 2009, Nature.

[4]  N. Joste,et al.  Differential Epigenetic Regulation of TOX Subfamily High Mobility Group Box Genes in Lung and Breast Cancers , 2012, PloS one.

[5]  P. Arrigo,et al.  An integrated genomic and proteomic approach to identify signatures of endosulfan exposure in hepatocellular carcinoma cells. , 2015, Pesticide biochemistry and physiology.

[6]  Trevor Hastie,et al.  Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.

[7]  Gilbert S Omenn,et al.  An integrative approach to reveal driver gene fusions from paired-end sequencing data in cancer , 2009, Nature Biotechnology.

[8]  M. Tatematsu,et al.  Chemical genomic screening for methylation‐silenced genes in gastric cancer cell lines using 5‐aza‐2′‐deoxycytidine treatment and oligonucleotide microarray , 2006, Cancer science.

[9]  Scott M Langevin,et al.  Epigenetics of lung cancer. , 2015, Translational research : the journal of laboratory and clinical medicine.

[10]  H. Aburatani,et al.  Elevated expression and potential roles of human Sp5, a member of Sp transcription factor family, in human cancers. , 2006, Biochemical and biophysical research communications.

[11]  S. Baylin,et al.  Epigenetic gene silencing in cancer – a mechanism for early oncogenic pathway addiction? , 2006, Nature Reviews Cancer.

[12]  W. Han,et al.  Genome Wide Methylome Alterations in Lung Cancer , 2015, PloS one.

[13]  H. Carén,et al.  The RASSF gene family members RASSF5, RASSF6 and RASSF7 show frequent DNA methylation in neuroblastoma , 2012, Molecular Cancer.

[14]  A. Corvalán,et al.  Identification of novel upregulated microRNAs in the pathogenesis of gastric cancer by the use of open access databases and bioinformatics tools. , 2015 .

[15]  Yang Ning,et al.  Semiparametric Tests for Identifying Differentially Methylated Loci With Case–Control Designs Using Illumina Arrays , 2014, Genetic epidemiology.

[16]  Peter A. Jones,et al.  The fundamental role of epigenetic events in cancer , 2002, Nature Reviews Genetics.

[17]  Jun Ho Yun,et al.  Identification of differentially-expressed genes by DNA methylation in cervical cancer , 2015, Oncology letters.

[18]  Shuigeng Zhou,et al.  NEpiC: a network-assisted algorithm for epigenetic studies using mean and variance combined signals , 2016, Nucleic acids research.

[19]  Yair Lotan,et al.  Detection of Bladder Cancer Using Novel DNA Methylation Biomarkers in Urine Sediments , 2011, Cancer Epidemiology, Biomarkers & Prevention.

[20]  Ru-Fang Yeh,et al.  Epigenetic profiling reveals etiologically distinct patterns of DNA methylation in head and neck squamous cell carcinoma. , 2009, Carcinogenesis.

[21]  Hongzhe Li,et al.  VARIABLE SELECTION AND REGRESSION ANALYSIS FOR GRAPH-STRUCTURED COVARIATES WITH AN APPLICATION TO GENOMICS. , 2010, The annals of applied statistics.

[22]  Andrew E. Teschendorff,et al.  Differential variability improves the identification of cancer risk markers in DNA methylation studies profiling precursor cancer lesions , 2012, Bioinform..

[23]  Chunlong Zhang,et al.  The Identification of Specific Methylation Patterns across Different Cancers , 2015, PloS one.

[24]  Elisabeth Brambilla,et al.  PARD3 Inactivation in Lung Squamous Cell Carcinomas Impairs STAT3 and Promotes Malignant Invasion. , 2015, Cancer research.

[25]  N. Meinshausen,et al.  Stability selection , 2008, 0809.2932.

[26]  Andrew E. Teschendorff,et al.  A comparison of feature selection and classification methods in DNA methylation studies using the Illumina Infinium platform , 2012, BMC Bioinformatics.

[27]  T. Rikiyama,et al.  GCF2/LRRFIP1 promotes colorectal cancer metastasis and liver invasion through integrin-dependent RhoA activation. , 2012, Cancer letters.

[28]  Yan Li,et al.  Downregulation of RBMS3 is associated with poor prognosis in esophageal squamous cell carcinoma. , 2011, Cancer research.

[29]  Hokeun Sun,et al.  Network-based regularization for matched case-control analysis of high-dimensional DNA methylation data. , 2013, Statistics in medicine.

[30]  Ehsan Motamedian,et al.  Investigation on metabolism of cisplatin resistant ovarian cancer using a genome scale metabolic model and microarray data , 2015, Iranian journal of basic medical sciences.

[31]  Luonan Chen,et al.  The Dynamics of DNA Methylation Covariation Patterns in Carcinogenesis , 2014, PLoS Comput. Biol..

[32]  P. Nederlof,et al.  Genomic signature of BRCA1 deficiency in sporadic basal‐like breast tumors , 2011, Genes, chromosomes & cancer.

[33]  J. Inazawa,et al.  Alteration in Copy Numbers of Genes as a Mechanism for Acquired Drug Resistance , 2004, Cancer Research.

[34]  F. Jasmine,et al.  Exploring genome-wide DNA methylation profiles altered in hepatocellular carcinoma using Infinium HumanMethylation 450 BeadChips , 2013, Epigenetics.

[35]  Yusuke Nakamura,et al.  A genome-wide association study reveals susceptibility variants for non-small cell lung cancer in the Korean population. , 2010, Human molecular genetics.

[36]  Chindo Hicks,et al.  An Integrative Genomics Approach to Biomarker Discovery in Breast Cancer , 2011, Cancer informatics.

[37]  Tso-Jung Yen,et al.  Discussion on "Stability Selection" by Meinshausen and Buhlmann , 2010 .

[38]  Ziwei Wang,et al.  VGLL4 inhibits EMT in part through suppressing Wnt/β-catenin signaling pathway in gastric cancer , 2015, Medical Oncology.

[39]  Y. Qin Inferences for case-control and semiparametric two-sample density ratio models , 1998 .

[40]  Trevor Hastie,et al.  Regularization Paths for Cox's Proportional Hazards Model via Coordinate Descent. , 2011, Journal of statistical software.

[41]  Mihaela Campan,et al.  Identification of a panel of sensitive and specific DNA methylation markers for squamous cell lung cancer , 2008, Molecular Cancer.

[42]  M Taniwaki,et al.  Defective expression of polarity protein PAR-3 gene (PARD3) in esophageal squamous cell carcinoma , 2009, Oncogene.

[43]  Shuang Wang,et al.  Penalized logistic regression for high-dimensional DNA methylation data with case-control studies , 2012, Bioinform..

[44]  Yan Li,et al.  RBMS3 at 3p24 Inhibits Nasopharyngeal Carcinoma Development via Inhibiting Cell Proliferation, Angiogenesis, and Inducing Apoptosis , 2012, PloS one.

[45]  Xifeng Dong,et al.  Advances in tumor markers of ovarian cancer for early diagnosis. , 2014, Indian journal of cancer.

[46]  H. Ashktorab,et al.  DNA Methylation and Colorectal Cancer , 2014, Current Colorectal Cancer Reports.

[47]  Jared C Roach,et al.  Application of affymetrix array and massively parallel signature sequencing for identification of genes involved in prostate cancer progression , 2005, BMC Cancer.

[48]  A. Feinberg,et al.  Increased methylation variation in epigenetic domains across cancer types , 2011, Nature Genetics.

[49]  Ruth Pidsley,et al.  A data-driven approach to preprocessing Illumina 450K methylation array data , 2013, BMC Genomics.

[50]  Piotr Zawierucha,et al.  Drug transporter expression profiling in chemoresistant variants of the A2780 ovarian cancer cell line. , 2014, Biomedicine & pharmacotherapy = Biomedecine & pharmacotherapie.

[51]  H. Ji,et al.  VGLL4 functions as a new tumor suppressor in lung cancer by negatively regulating the YAP-TEAD transcriptional complex , 2014, Cell Research.

[52]  S. Sze,et al.  Global molecular dysfunctions in gastric cancer revealed by an integrated analysis of the phosphoproteome and transcriptome , 2011, Cellular and Molecular Life Sciences.

[53]  A. Kallioniemi,et al.  High-level amplification at 17q23 leads to coordinated overexpression of multiple adjacent genes in breast cancer , 2007, British Journal of Cancer.

[54]  Y. Qi,et al.  Effects of Obesity on Transcriptomic Changes and Cancer Hallmarks in Estrogen Receptor–Positive Breast Cancer , 2014, Journal of the National Cancer Institute.

[55]  Andrew E. Teschendorff,et al.  An integrative network algorithm identifies age-associated differential methylation interactome hotspots targeting stem-cell differentiation pathways , 2013, Scientific Reports.

[56]  P. V. van Diest,et al.  Frequent promoter hypermethylation of BRCA2, CDH13, MSH6, PAX5, PAX6 and WT1 in ductal carcinoma in situ and invasive breast cancer , 2011, The Journal of pathology.

[57]  B. Davidson,et al.  Low Frequency of ESRRA–C11orf20 Fusion Gene in Ovarian Carcinomas , 2014, PLoS biology.

[58]  M. Zhang,et al.  DNA Methylation Patterns Can Estimate Nonequivalent Outcomes of Breast Cancer with the Same Receptor Subtypes , 2015, PloS one.

[59]  Zhen-Yu You,et al.  Analyzing the differentially expressed genes and pathway cross‐talk in aggressive breast cancer , 2015, The journal of obstetrics and gynaecology research.

[60]  Rui Feng,et al.  NETWORK-REGULARIZED HIGH-DIMENSIONAL COX REGRESSION FOR ANALYSIS OF GENOMIC DATA. , 2014, Statistica Sinica.

[61]  Andrew E. Teschendorff,et al.  A systems-level integrative framework for genome-wide DNA methylation and gene expression data identifies differential gene expression modules under epigenetic control , 2014, Bioinform..