PLEMT: A Novel Pseudolikelihood-Based EM Test for Homogeneity in Generalized Exponential Tilt Mixture Models

ABSTRACT Motivated by analyses of DNA methylation data, we propose a semiparametric mixture model, namely, the generalized exponential tilt mixture model, to account for heterogeneity between differentially methylated and nondifferentially methylated subjects in the cancer group, and capture the differences in higher order moments (e.g., mean and variance) between subjects in cancer and normal groups. A pairwise pseudolikelihood is constructed to eliminate the unknown nuisance function. To circumvent boundary and nonidentifiability problems as in parametric mixture models, we modify the pseudolikelihood by adding a penalty function. In addition, the test with simple asymptotic distribution has computational advantages compared with permutation-based test for high-dimensional genetic or epigenetic data. We propose a pseudolikelihood-based expectation–maximization test, and show the proposed test follows a simple chi-squared limiting distribution. Simulation studies show that the proposed test controls Type I errors well and has better power compared to several current tests. In particular, the proposed test outperforms the commonly used tests under all simulation settings considered, especially when there are variance differences between two groups. The proposed test is applied to a real dataset to identify differentially methylated sites between ovarian cancer subjects and normal subjects. Supplementary materials for this article are available online.

[1]  A. Feinberg,et al.  Increased methylation variation in epigenetic domains across cancer types , 2011, Nature Genetics.

[2]  A. Harris,et al.  Human CHCHD4 mitochondrial proteins regulate cellular oxygen consumption rate and metabolism and provide a critical role in hypoxia signaling and tumor progression. , 2012, The Journal of clinical investigation.

[3]  A. Oshlack,et al.  DiffVar: a new method for detecting differential variability with application to methylation in cancer and aging , 2014, bioRxiv.

[4]  Xin Gao,et al.  Composite Likelihood EM Algorithm with Applications to Multivariate Hidden Markov Model , 2009 .

[5]  S. Baylin,et al.  Cancer epigenetics: tumor heterogeneity, plasticity of stem-like states, and drug resistance. , 2014, Molecular cell.

[6]  Jianqing Fan,et al.  Sure independence screening for ultrahigh dimensional feature space , 2006, math/0612857.

[7]  Clarice R Weinberg,et al.  Epigenome-wide association study of breast cancer using prospectively collected sister study samples. , 2013, Journal of the National Cancer Institute.

[8]  R. Hruban,et al.  Prioritization of driver mutations in pancreatic cancer using cancer-specific high-throughput annotation of somatic mutations (CHASM) , 2010, Cancer biology & therapy.

[9]  B. Christensen,et al.  Aging and Environmental Exposures Alter Tissue-Specific DNA Methylation Dependent upon CpG Island Context , 2009, PLoS genetics.

[10]  Thomas Mikeska,et al.  The implications of heterogeneous DNA methylation for the accurate quantification of methylation. , 2010, Epigenomics.

[11]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[12]  R. Davies Hypothesis Testing when a Nuisance Parameter is Present Only Under the Alternatives , 1987 .

[13]  K. Liang,et al.  On the asymptotic behaviour of the pseudolikelihood ratio test statistic with boundary problems , 2010 .

[14]  Amos Tanay,et al.  Intratumor DNA methylation heterogeneity reflects clonal evolution in aggressive prostate cancer. , 2014, Cell reports.

[15]  S. Riazuddin,et al.  Homozygous missense variant in the human CNGA3 channel causes cone-rod dystrophy , 2014, European Journal of Human Genetics.

[16]  Margaret R. Karagas,et al.  Model-based clustering of DNA methylation array data: a recursive-partitioning algorithm for high-dimensional data arising as a mixture of beta distributions , 2008, BMC Bioinformatics.

[17]  Øivind Skare,et al.  Pairwise likelihood inference in spatial generalized linear mixed models , 2005, Comput. Stat. Data Anal..

[18]  Xudong Huang,et al.  Age-adjusted nonparametric detection of differential DNA methylation with case–control designs , 2013, BMC Bioinformatics.

[19]  Wolfgang Wagner,et al.  Age-dependent DNA methylation of genes that are suppressed in stem cells is a hallmark of cancer. , 2010, Genome research.

[20]  J. Kalbfleisch Non‐Parametric Bayesian Analysis of Survival Time Data , 1978 .

[21]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[22]  B. Schmolck Testing for homogeneity , 2000 .

[23]  R. Davies Hypothesis testing when a nuisance parameter is present only under the alternative , 1977 .

[24]  G. Zheng,et al.  Comparison of Maximum Statistics for Hypothesis Testing When a Nuisance Parameter Is Present only under the Alternative , 2005, Biometrics.

[25]  Peter W. Laird,et al.  A comparison of cluster analysis methods using DNA methylation data , 2004, Bioinform..

[26]  Markus Neuhäuser,et al.  Exact Tests for the Analysis of Case-Control Studies of Genetic Markers , 2003, Human Heredity.

[27]  Y. Hirooka,et al.  Silencing of STRN4 suppresses the malignant characteristics of cancer cells , 2014, Cancer science.

[28]  L. Briollais,et al.  AN EM COMPOSITE LIKELIHOOD APPROACH FOR MULTISTAGE SAMPLING OF FAMILY DATA. , 2011, Statistica Sinica.

[29]  Jason P. Fine,et al.  On empirical likelihood for a semiparametric mixture model , 2002 .

[30]  Z. Tan,et al.  A note on profile likelihood for exponential tilt mixture models , 2009 .

[31]  Zhi-yong Liu,et al.  The molecular mechanism of breast cancer cell apoptosis induction by absent in melanoma (AIM2). , 2015, International journal of clinical and experimental medicine.

[32]  Kung-Yee Liang,et al.  Likelihood Ratio Testing for Admixture Models with Application to Genetic Linkage Analysis , 2011, Biometrics.

[33]  G. Pfeifer,et al.  Identification of driver and passenger DNA methylation in cancer by epigenomic analysis. , 2010, Advances in genetics.

[34]  Pengfei Li,et al.  Non-finite Fisher information and homogeneity: an EM approach , 2009 .

[35]  Yukun Liu,et al.  Testing Homogeneity in a Semiparametric Two-Sample Problem , 2012 .

[36]  H. Kitchener,et al.  Epigenetic variability in cells of normal cytology is associated with the risk of future morphological transformation , 2012, Genome Medicine.

[37]  John D. Storey A direct approach to false discovery rates , 2002 .

[38]  W Y Zhang,et al.  Discussion on `Sure independence screening for ultra-high dimensional feature space' by Fan, J and Lv, J. , 2008 .

[39]  K. Liang,et al.  Asymptotic Properties of Maximum Likelihood Estimators and Likelihood Ratio Tests under Nonstandard Conditions , 1987 .

[40]  Marie Schmidt,et al.  Nonparametrics Statistical Methods Based On Ranks , 2016 .

[41]  Jennifer R Harris,et al.  Extensive variation and low heritability of DNA methylation identified in a twin study. , 2011, Genome research.

[42]  Jiahua Chen,et al.  The likelihood ratio test for homogeneity in finite mixture models , 2001 .

[43]  Kung-Yee Liang,et al.  On the asymptotic behaviour of the pseudolikelihood ratio test statistic with boundary problems. , 1996, Biometrika.

[44]  Tao Wang,et al.  A Powerful Statistical Method for Identifying Differentially Methylated Markers in Complex Diseases , 2012, Pacific Symposium on Biocomputing.

[45]  Scott L. Zeger,et al.  Latent Variable Regression for Multiple Discrete Outcomes , 1997 .

[46]  Paola Flocchini,et al.  On the Asymptotic Behavior of , 2009 .

[47]  Yuejiao Fu,et al.  Testing for homogeneity in genetic linkage analysis , 2006 .

[48]  B. Davidson,et al.  HOXB8 expression in ovarian serous carcinoma effusions is associated with shorter survival. , 2013, Gynecologic oncology.

[49]  W. Kaelin,et al.  Role of VHL gene mutation in human cancer. , 2004, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[50]  Mitchell J. Mergenthaler Nonparametrics: Statistical Methods Based on Ranks , 1979 .

[51]  Yichao Wu,et al.  Ultrahigh Dimensional Feature Selection: Beyond The Linear Model , 2009, J. Mach. Learn. Res..

[52]  Christer Larsson,et al.  Regulation of PMP22 mRNA by G3BP1 affects cell proliferation in breast cancer cells , 2013, Molecular Cancer.

[53]  Kung-Yee Liang,et al.  Hypothesis Testing in a Mixture Case–Control Model , 2011, Biometrics.

[54]  A. Owen Empirical likelihood ratio confidence intervals for a single functional , 1988 .

[55]  K. Liang,et al.  Regression analysis under non‐standard situations: a pairwise pseudolikelihood approach , 2000 .

[56]  Shuang Wang,et al.  Method to detect differentially methylated loci with case‐control designs using Illumina arrays , 2011, Genetic epidemiology.

[57]  J. Issa,et al.  Epigenetic variation and cellular Darwinism , 2011, Nature Genetics.

[58]  Hongtu Zhu,et al.  Hypothesis testing in mixture regression models , 2004 .

[59]  Jing Qin,et al.  Empirical likelihood ratio based confidence intervals for mixture proportions , 1999 .

[60]  Saralees Nadarajah,et al.  Detecting differentially methylated loci for Illumina Array methylation data based on human ovarian cancer data , 2013, BMC Medical Genomics.

[61]  Christopher R. Schmidt,et al.  Evolution of DNA methylation is linked to genetic aberrations in chronic lymphocytic leukemia. , 2014, Cancer discovery.