Bayesian hierarchical structured variable selection methods with application to molecular inversion probe studies in breast cancer

type="main" xml:id="rssc12053-abs-0001"> The analysis of genomics alterations that may occur in nature when segments of chromosomes are copied (known as copy number alterations) has been a focus of research to identify genetic markers of cancer. One high throughput technique that has recently been adopted is the use of molecular inversion probes to measure probe copy number changes. The resulting data consist of high dimensional copy number profiles that can be used to ascertain probe-specific copy number alterations in correlative studies with patient outcomes to guide risk stratification and future treatment. We propose a novel Bayesian variable selection method, the hierarchical structured variable selection method, which accounts for the natural gene and probe-within-gene architecture to identify important genes and probes associated with clinically relevant outcomes. We propose the hierarchical structured variable selection model for grouped variable selection, where simultaneous selection of both groups and within-group variables is of interest. The hierarchical structured variable selection model utilizes a discrete mixture prior distribution for group selection and group-specific Bayesian lasso hierarchies for variable selection within groups. We provide methods for accounting for serial correlations within groups that incorporate Bayesian fused lasso methods for within-group selection. Through simulations we establish that our method results in lower model errors than other methods when a natural grouping structure exists. We apply our method to a molecular inversion probe study of breast cancer and show that it identifies genes and probes that are significantly associated with clinically relevant subtypes of breast cancer.

[1]  James G. Scott,et al.  Bayes and empirical-Bayes multiplicity adjustment in the variable-selection problem , 2010, 1011.2333.

[2]  G. Chinnadurai,et al.  CtIP, a candidate tumor susceptibility gene is a team player with luminaries. , 2006, Biochimica et biophysica acta.

[3]  T. J. Mitchell,et al.  Bayesian Variable Selection in Linear Regression , 1988 .

[4]  G. Casella,et al.  Penalized regression, standard errors, and Bayesian lassos , 2010 .

[5]  Jeffrey S. Morris,et al.  Bayesian Random Segmentation Models to Identify Shared Copy Number Aberrations for Array CGH Data , 2010, Journal of the American Statistical Association.

[6]  Keming Yu,et al.  Bayesian Mode Regression , 2012, 1208.0579.

[7]  N. Zhang,et al.  Bayesian Variable Selection in Structured High-Dimensional Covariate Spaces With Applications in Genomics , 2010 .

[8]  Jian Huang,et al.  Identification of non-Hodgkin's lymphoma prognosis signatures using the CTGDR method , 2010, Bioinform..

[9]  Jian Huang,et al.  The Sparse Laplacian Shrinkage Estimator for High-Dimensional Regression. , 2011, Annals of statistics.

[10]  P. Müller,et al.  Optimal Sample Size for Multiple Testing , 2004 .

[11]  Bin Nan,et al.  Hierarchically penalized Cox regression with grouped variables , 2009 .

[12]  P. Zhao,et al.  Grouped and Hierarchical Model Selection through Composite Absolute Penalties , 2007 .

[13]  Shizhong Xu Estimating polygenic effects using markers of the entire genome. , 2003, Genetics.

[14]  Motohiro Kato,et al.  Highly sensitive method for genomewide detection of allelic composition in nonpaired, primary tumor specimens by use of affymetrix single-nucleotide-polymorphism genotyping microarrays. , 2007, American journal of human genetics.

[15]  D. F. Andrews,et al.  Scale Mixtures of Normal Distributions , 1974 .

[16]  E. George,et al.  APPROACHES FOR BAYESIAN VARIABLE SELECTION , 1997 .

[17]  Yi Li,et al.  Bayesian Hidden Markov Modeling of Array CGH Data , 2008, Journal of the American Statistical Association.

[18]  Jian Huang,et al.  Penalized methods for bi-level variable selection. , 2009, Statistics and its interface.

[19]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[20]  P. Zhao,et al.  The composite absolute penalties family for grouped and hierarchical variable selection , 2009, 0909.0411.

[21]  Jürgen Dittmer,et al.  The Biology of the Ets1 Proto-Oncogene , 2003, Molecular Cancer.

[22]  J. Sebat,et al.  Representational oligonucleotide microarray analysis: a high-resolution method to detect genome copy number variation. , 2003, Genome research.

[23]  J. Friedman,et al.  A Statistical View of Some Chemometrics Regression Tools , 1993 .

[24]  David J. Nott,et al.  The predictive Lasso , 2010, Stat. Comput..

[25]  J. Griffin,et al.  Bayesian adaptive lassos with non-convex penalization , 2007 .

[26]  J. Griffin,et al.  Inference with normal-gamma prior distributions in regression problems , 2010 .

[27]  H. Bondell,et al.  Simultaneous Regression Shrinkage, Variable Selection, and Supervised Clustering of Predictors with OSCAR , 2008, Biometrics.

[28]  C. Holmes,et al.  Bayesian auxiliary variable models for binary and multinomial regression , 2006 .

[29]  Jeffrey S. Morris,et al.  Bayesian Analysis of Mass Spectrometry Proteomic Data Using Wavelet‐Based Functional Mixed Models , 2008, Biometrics.

[30]  K. Strebhardt,et al.  Polo-like kinases and oncogenesis , 2005, Oncogene.

[31]  M. Yuan,et al.  Model selection and estimation in regression with grouped variables , 2006 .

[32]  D. Pinkel,et al.  Array comparative genomic hybridization and its applications in cancer , 2005, Nature Genetics.

[33]  E. George,et al.  Journal of the American Statistical Association is currently published by American Statistical Association. , 2007 .

[34]  Kim-Anh Do,et al.  Bayesian ensemble methods for survival prediction in gene expression data , 2011, Bioinform..

[35]  G. Casella,et al.  The Bayesian Lasso , 2008 .

[36]  John D. Storey The positive false discovery rate: a Bayesian interpretation and the q-value , 2003 .

[37]  D. Madigan,et al.  Bayesian Model Averaging for Linear Regression Models , 1997 .

[38]  Jianqing Fan,et al.  Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties , 2001 .

[39]  P. Nederlof,et al.  Array-CGH and breast cancer , 2006, Breast Cancer Research.

[40]  B. Carlin,et al.  Bayesian Model Choice Via Markov Chain Monte Carlo Methods , 1995 .

[41]  S. Chib,et al.  Bayesian analysis of binary and polychotomous response data , 1993 .

[42]  Mary E. Edgerton,et al.  Selective Genomic Copy Number Imbalances and Probability of Recurrence in Early-Stage Breast Cancer , 2011, PloS one.

[43]  Ronald W. Davis,et al.  Multiplexed genotyping with sequence-tagged molecular inversion probes , 2003, Nature Biotechnology.

[44]  J. S. Rao,et al.  Spike and Slab Gene Selection for Multigroup Microarray Data , 2005 .

[45]  James Ireland,et al.  Analysis of molecular inversion probe performance for allele copy number determination , 2007, Genome Biology.

[46]  J. Lawler,et al.  Thrombospondin‐1 as an endogenous inhibitor of angiogenesis and tumor growth , 2002, Journal of cellular and molecular medicine.

[47]  W. Kuo,et al.  High resolution analysis of DNA copy number variation using comparative genomic hybridization to microarrays , 1998, Nature Genetics.

[48]  R. Tibshirani,et al.  Sparsity and smoothness via the fused lasso , 2005 .

[49]  Adrian E. Raftery,et al.  Bayesian model averaging: a tutorial (with comments by M. Clyde, David Draper and E. I. George, and a rejoinder by the authors , 1999 .

[50]  M. West On scale mixtures of normal distributions , 1987 .

[51]  L. Feuk,et al.  Detection of large-scale variation in the human genome , 2004, Nature Genetics.

[52]  Bani K. Mallick,et al.  Gene selection using a two-level hierarchical Bayesian model , 2004, Bioinform..

[53]  Jorma Isola,et al.  Patterns of chromosomal imbalances defines subgroups of breast cancer with distinct clinical features and prognosis. A study of 305 tumors by comparative genomic hybridization. , 2003, Cancer research.

[54]  L. Shaw,et al.  Divergent Roles for IRS-1 and IRS-2 in Breast Cancer Metastasis , 2007, Cell cycle.