Shared kernel Bayesian screening

This article concerns testing for equality of distribution between groups. We focus on screening variables with shared distributional features such as common support, modes and patterns of skewness. We propose a Bayesian testing method using kernel mixtures, which improves performance by borrowing information across the different variables and groups through shared kernels and a common probability of group differences. The inclusion of shared kernels in a finite mixture, with Dirichlet priors on the weights, leads to a simple framework for testing that scales well for high-dimensional data. We provide closed asymptotic forms for the posterior probability of equivalence in two groups and prove consistency under model misspecification. The method is applied to DNA methylation array data from a breast cancer study, and compares favourably to competitors when Type I error is estimated via permutation.

[1]  S. Yakowitz,et al.  On the Identifiability of Finite Mixtures , 1968 .

[2]  M. Stephens,et al.  K-Sample Anderson–Darling Tests , 1987 .

[3]  J. F. Crook,et al.  The Robustness and Sensitivity of the Mixed-Dirichlet Bayesian Test for "Independence" in Contingency Tables , 1987 .

[4]  G. Ronning Maximum likelihood estimation of dirichlet distributions , 1989 .

[5]  J. Albert Bayesian Testing and Estimation of Association in a Two-Way Contingency Table , 1997 .

[6]  D. Berry,et al.  Bayesian multiple comparisons using dirichlet process priors , 1998 .

[7]  S. Walker Modern Bayesian Asymptotics , 2004 .

[8]  James G. Scott,et al.  An exploration of aspects of Bayesian multiple testing , 2006 .

[9]  Kenneth Rice,et al.  FDR and Bayesian Multiple Comparisons Rules , 2006 .

[10]  A. V. D. Vaart,et al.  Misspecification in infinite-dimensional Bayesian statistics , 2006, math/0607023.

[11]  J. Gray,et al.  Gamma-Normal-Gamma Mixture Model for Detecting Differentially Methylated Loci in Three Breast Cancer Cell Lines , 2007, Cancer informatics.

[12]  D. Dunson,et al.  Bayesian nonparametric inference on stochastic ordering. , 2008, Biometrika.

[13]  Margaret R. Karagas,et al.  Model-based clustering of DNA methylation array data: a recursive-partitioning algorithm for high-dimensional data arising as a mixture of beta distributions , 2008, BMC Bioinformatics.

[14]  D. Dunson,et al.  Nonparametric Bayes Testing of Changes in a Response Distribution with an Ordinal Predictor , 2008, Biometrics.

[15]  A. Nobel,et al.  Supervised risk predictor of breast cancer based on intrinsic subtypes. , 2009, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[16]  James G. Scott,et al.  Bayes and empirical-Bayes multiplicity adjustment in the variable-selection problem , 2010, 1011.2333.

[17]  P. Laird Principles and challenges of genome-wide DNA methylation analysis , 2010, Nature Reviews Genetics.

[18]  V. Johnson,et al.  On the use of non‐local prior densities in Bayesian hypothesis tests , 2010 .

[19]  Li Ma,et al.  Coupling Optional Pólya Trees and the Two Sample Problem , 2010, 1011.1253.

[20]  E. Bedrick,et al.  Hypothesis Tests on Mixture Model Components with Applications in Ecology and Agriculture , 2010 .

[21]  L. Carin,et al.  Predicting Viral Infection From High-Dimensional Biomarker Trajectories , 2011, Journal of the American Statistical Association.

[22]  A. Feinberg,et al.  Increased methylation variation in epigenetic domains across cancer types , 2011, Nature Genetics.

[23]  Peng Qiu,et al.  Identification of markers associated with global changes in DNA methylation regulation in cancers , 2012, BMC Bioinformatics.

[24]  C. Bock Analysing and interpreting DNA methylation data , 2012, Nature Reviews Genetics.

[25]  Francine E. Garrett-Bakelman,et al.  methylKit: a comprehensive R package for the analysis of genome-wide DNA methylation profiles , 2012, Genome Biology.

[26]  F. Markowetz,et al.  The genomic and transcriptomic architecture of 2,000 breast tumours reveals novel subgroups , 2012, Nature.

[27]  J. Kere,et al.  Differential DNA Methylation in Purified Human Blood Cells: Implications for Cell Lineage and Studies on Disease Susceptibility , 2012, PloS one.

[28]  J. Booth,et al.  Integrative Model-based clustering of microarray methylation and expression data , 2012, 1210.0702.

[29]  Yufei Huang,et al.  A nonparametric Bayesian approach for clustering bisulfate-based DNA methylation profiles , 2012, BMC Genomics.

[30]  Steven J. M. Jones,et al.  Comprehensive molecular portraits of human breast tumors , 2012, Nature.

[31]  David B. Dunson,et al.  Bayesian consensus clustering , 2013, Bioinform..

[32]  Steven J. M. Jones,et al.  Comprehensive molecular portraits of human breast tumours , 2013 .

[33]  C. Holmes,et al.  Two-sample Bayesian Nonparametric Hypothesis Testing , 2009, 0910.5060.