Regularized Multivariate Regression for Identifying Master Predictors with Application to Integrative Genomics Study of Breast Cancer.

In this paper, we propose a new method remMap - REgularized Multivariate regression for identifying MAster Predictors - for fitting multivariate response regression models under the high-dimension-low-sample-size setting. remMap is motivated by investigating the regulatory relationships among different biological molecules based on multiple types of high dimensional genomic data. Particularly, we are interested in studying the influence of DNA copy number alterations on RNA transcript levels. For this purpose, we model the dependence of the RNA expression levels on DNA copy numbers through multivariate linear regressions and utilize proper regularization to deal with the high dimensionality as well as to incorporate desired network structures. Criteria for selecting the tuning parameters are also discussed. The performance of the proposed method is illustrated through extensive simulation studies. Finally, remMap is applied to a breast cancer study, in which genome wide RNA transcript levels and DNA copy numbers were measured for 172 tumor samples. We identify a trans-hub region in cytoband 17q12-q21, whose amplification influences the RNA expression levels of more than 30 unlinked genes. These findings may lead to a better understanding of breast cancer pathology.

[1]  A. Izenman Reduced-rank regression for the multivariate linear model , 1975 .

[2]  T. Hastie,et al.  [A Statistical View of Some Chemometrics Regression Tools]: Discussion , 1993 .

[3]  J. Friedman,et al.  A Statistical View of Some Chemometrics Regression Tools , 1993 .

[4]  Chih-Ling Tsai,et al.  MODEL SELECTION FOR MULTIVARIATE REGRESSION IN SMALL SAMPLES , 1994 .

[5]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[6]  Trevor Hastie,et al.  Predicting multivariate responses in multiple linear regression - Discussion , 1997 .

[7]  Y. Fujikoshi,et al.  Modified AIC and Cp in multivariate linear regression , 1997 .

[8]  J. Friedman,et al.  Predicting Multivariate Responses in Multiple Linear Regression , 1997 .

[9]  Wenjiang J. Fu Penalized Regressions: The Bridge versus the Lasso , 1998 .

[10]  J. Nahmias,et al.  Localization of human cadherin genes to chromosome regions exhibiting cancer-related loss of heterozygosity. , 1998, Genomics.

[11]  T. Fearn,et al.  Multivariate Bayesian variable selection and prediction , 1998 .

[12]  G. Reinsel,et al.  Multivariate Reduced-Rank Regression: Theory and Applications , 1998 .

[13]  T. Fearn,et al.  The choice of variables in multivariate regression: a non-conjugate Bayesian decision theory approach , 1999 .

[14]  Sergey Bakin,et al.  Adaptive regression and model selection in data mining problems , 1999 .

[15]  R. Tibshirani,et al.  Gene expression patterns of breast carcinomas distinguish tumor subclasses with clinical implications , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[16]  A. Barabasi,et al.  Lethality and centrality in protein networks , 2001, Nature.

[17]  Jianqing Fan,et al.  Regularization of Wavelet Approximations , 2001 .

[18]  M. Ringnér,et al.  Impact of DNA amplification on gene expression patterns in breast cancer. , 2002, Cancer research.

[19]  Christian A. Rees,et al.  Microarray analysis reveals a major direct role of DNA copy number alteration in the transcriptional program of human breast tumors , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[20]  T. Fearn,et al.  Bayes model averaging with selection of regressors , 2002 .

[21]  Mark D. Johnson,et al.  Peg3/Pw1 Is a Mediator between p53 and Bax in DNA Damage-induced Neuronal Death* , 2002, The Journal of Biological Chemistry.

[22]  Van,et al.  A gene-expression signature as a predictor of survival in breast cancer. , 2002, The New England journal of medicine.

[23]  J. Collins,et al.  Inferring Genetic Networks and Identifying Compound Mode of Action via Expression Profiling , 2003, Science.

[24]  R. Tibshirani,et al.  Repeated observation of breast tumor subtypes in independent gene expression data sets , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[25]  D. Albertson,et al.  Chromosome aberrations in solid tumors , 2003, Nature Genetics.

[26]  Mark E. J. Newman,et al.  The Structure and Function of Complex Networks , 2003, SIAM Rev..

[27]  Ash A. Alizadeh,et al.  Gene Expression Signature of Fibroblast Serum Response Predicts Human Cancer Progression: Similarities between Tumors and Wounds , 2004, PLoS biology.

[28]  M. Cronin,et al.  A multigene assay to predict recurrence of tamoxifen-treated, node-negative breast cancer. , 2004, The New England journal of medicine.

[29]  R. Tibshirani,et al.  Least angle regression , 2004, math/0406456.

[30]  David Botstein,et al.  Different gene expression patterns in invasive lobular and ductal carcinomas of the breast. , 2004, Molecular biology of the cell.

[31]  R. Tibshirani,et al.  On the “degrees of freedom” of the lasso , 2007, 0712.0881.

[32]  J. Foekens,et al.  Gene-expression profiles to predict distant metastasis of lymph-node-negative primary breast cancer , 2005, The Lancet.

[33]  Jian-Min Yuan,et al.  Polymorphisms in angiotensin II type 1 receptor and angiotensin I-converting enzyme genes and breast cancer risk among Chinese women in Singapore. , 2004, Carcinogenesis.

[34]  Stephen J. Wright,et al.  Simultaneous Variable Selection , 2005, Technometrics.

[35]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[36]  R. Tibshirani,et al.  A method for calling gains and losses in array CGH data. , 2005, Biostatistics.

[37]  David I. Smith,et al.  Biallelic methylation and silencing of paternally expressed gene 3 (PEG3) in gynecologic cancer cell lines. , 2005, Gynecologic oncology.

[38]  B. Peter,et al.  BOOSTING FOR HIGH-MULTIVARIATE RESPONSES IN HIGH-DIMENSIONAL LINEAR REGRESSION , 2006 .

[39]  M. J. van de Vijver,et al.  Gene expression profiling in breast cancer: understanding the molecular basis of histologic grade to improve prognosis. , 2006, Journal of the National Cancer Institute.

[40]  Hongjuan Zhao,et al.  TP53 mutation status and gene expression profiles are powerful prognostic markers of breast cancer , 2007, Breast Cancer Research.

[41]  Robert Tibshirani,et al.  Distinct patterns of DNA copy number alteration are associated with different clinicopathological features and gene‐expression subtypes of breast cancer , 2006, Genes, chromosomes & cancer.

[42]  J. Pollack,et al.  RNA interference‐based functional dissection of the 17q12 amplicon in breast cancer reveals contribution of coamplified genes , 2006, Genes, chromosomes & cancer.

[43]  M. Yuan,et al.  Model selection and estimation in regression with grouped variables , 2006 .

[44]  Amy V Kapp,et al.  Discovery and validation of breast cancer subtypes , 2006, BMC Genomics.

[45]  M. Yuan,et al.  Dimension reduction and coefficient estimation in multivariate linear regression , 2007 .

[46]  M. Bissell,et al.  Polo-like kinase 1 is involved in invasion through extracellular matrix. , 2007, Cancer research.

[47]  P. Zhao,et al.  Grouped and Hierarchical Model Selection through Composite Absolute Penalties , 2007 .

[48]  M. Ringnér,et al.  Poor prognosis in carcinoma is associated with a gene expression signature of aberrant PTEN tumor suppressor pathway activity , 2007, Proceedings of the National Academy of Sciences.

[49]  R. Tibshirani,et al.  PATHWISE COORDINATE OPTIMIZATION , 2007, 0708.1485.

[50]  P. Kronqvist,et al.  Proliferation marker securin identifies favourable outcome in invasive ductal breast cancer , 2008, British Journal of Cancer.

[51]  R. Tibshirani,et al.  Spatial smoothing and hot spot detection for CGH data using the fused lasso. , 2008, Biostatistics.

[52]  T. Nielsen,et al.  GATA-3 Expression in Breast Cancer Has a Strong Association with Estrogen Receptor but Lacks Independent Prognostic Value , 2008, Cancer Epidemiology Biomarkers & Prevention.

[53]  Michael I. Jordan,et al.  Union support recovery in high-dimensional multivariate regression , 2008, 2008 46th Annual Allerton Conference on Communication, Control, and Computing.

[54]  Mads Thomassen,et al.  Gene expression meta-analysis identifies chromosomal regions and candidate genes involved in breast cancer metastasis , 2008, Breast Cancer Research and Treatment.

[55]  S. Luoh,et al.  GRB-7 facilitates HER-2/Neu-mediated signal transduction and tumor formation. , 2007, Carcinogenesis.

[56]  Wonshik Han,et al.  CAMK1D amplification implicated in epithelial–mesenchymal transition in basal‐like breast cancer , 2008, Molecular oncology.

[57]  Eric P. Xing,et al.  A multivariate regression approach to association analysis of a quantitative trait network , 2008, Bioinform..

[58]  Pei Wang,et al.  Partial Correlation Estimation by Joint Sparse Regression Models , 2008, Journal of the American Statistical Association.

[59]  P. Zhao,et al.  The composite absolute penalties family for grouped and hierarchical variable selection , 2009, 0909.0411.

[60]  Trevor Hastie,et al.  Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.