Dimension reduction for high-dimensional data.

With advancing of modern technologies, high-dimensional data have prevailed in computational biology. The number of variables p is very large, and in many applications, p is larger than the number of observational units n. Such high dimensionality and the unconventional small-n-large-p setting have posed new challenges to statistical analysis methods. Dimension reduction, which aims to reduce the predictor dimension prior to any modeling efforts, offers a potentially useful avenue to tackle such high-dimensional regression. In this chapter, we review a number of commonly used dimension reduction approaches, including principal component analysis, partial least squares, and sliced inverse regression. For each method, we review its background and its applications in computational biology, discuss both its advantages and limitations, and offer enough operational details for implementation. A numerical example of analyzing a microarray survival data is given to illustrate applications of the reviewed reduction methods.

[1]  L. Staudt,et al.  The use of molecular profiling to predict survival after chemotherapy for diffuse large-B-cell lymphoma. , 2002, The New England journal of medicine.

[2]  Jane-Ling Wang,et al.  Dimension reduction for censored regression data , 1999 .

[3]  I. Helland Maximum likelihood regression on relevant components , 1992 .

[4]  R. Christensen,et al.  Fisher Lecture: Dimension Reduction in Regression , 2007, 0708.3774.

[5]  F. Chiaromonte,et al.  Dimension reduction strategies for analyzing global gene expression data with a response. , 2002, Mathematical biosciences.

[6]  Bing Li,et al.  ON PRINCIPAL COMPONENTS AND REGRESSION: A STATISTICAL EXPLANATION OF A NATURAL PHENOMENON , 2009 .

[7]  R. Cook,et al.  Partial inverse regression , 2007 .

[8]  D. Reich,et al.  Principal components analysis corrects for stratification in genome-wide association studies , 2006, Nature Genetics.

[9]  W Y Zhang,et al.  Discussion on `Sure independence screening for ultra-high dimensional feature space' by Fan, J and Lv, J. , 2008 .

[10]  Lu Tian,et al.  Linking gene expression data with patient survival times using partial least squares , 2002, ISMB.

[11]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[12]  D. Cox,et al.  Notes on Some Aspects of Regression Analysis , 1968 .

[13]  R. Spang,et al.  Predicting the clinical status of human breast cancer by using gene expression profiles , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[14]  John D. Storey,et al.  Capturing Heterogeneity in Gene Expression Studies by Surrogate Variable Analysis , 2007, PLoS genetics.

[15]  Gersende Fort,et al.  Classification using partial least squares with penalized logistic regression , 2005, Bioinform..

[16]  M. Tenenhaus,et al.  Prediction of clinical outcome with microarray data: a partial least squares discriminant analysis (PLS-DA) approach , 2003, Human Genetics.

[17]  R. Cook Graphics for regressions with a binary response , 1996 .

[18]  Shaoli Wang,et al.  On Directional Regression for Dimension Reduction , 2007 .

[19]  D. Botstein,et al.  Singular value decomposition for genome-wide expression data processing and modeling. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[20]  Lexin Li,et al.  Survival prediction of diffuse large-B-cell lymphoma based on both clinical and gene expression information , 2006, Bioinform..

[21]  Ker-Chau Li,et al.  On Principal Hessian Directions for Data Visualization and Dimension Reduction: Another Application of Stein's Lemma , 1992 .

[22]  Lexin Li,et al.  Cluster-based estimation for sufficient dimension reduction , 2004, Comput. Stat. Data Anal..

[23]  R. Dennis Cook,et al.  A note on shrinkage sliced inverse regression , 2005 .

[24]  Hongzhe Li,et al.  Dimension reduction methods for microarrays with application to censored survival data , 2004, Bioinform..

[25]  Yu Zhu,et al.  Fourier Methods for Estimating the Central Subspace and the Central Mean Subspace in Regression , 2006 .

[26]  Lexin Li,et al.  Sparse sufficient dimension reduction , 2007 .

[27]  Ruth M. Pfeiffer,et al.  Graphical Methods for Class Prediction Using Dimension Reduction Techniques on DNA Microarray Data , 2003, Bioinform..

[28]  Jiang Gui,et al.  Partial Cox regression analysis for high-dimensional microarray gene expression data , 2004, ISMB/ECCB.

[29]  Ker-Chau Li,et al.  Regression Analysis Under Link Violation , 1989 .

[30]  Peng Zeng,et al.  RSIR: regularized sliced inverse regression for motif discovery , 2005, Bioinform..

[31]  Danh V. Nguyen,et al.  Partial least squares proportional hazard regression for application to DNA microarray survival data , 2002, Bioinform..

[32]  I. Helland,et al.  Comparison of Prediction Methods when Only a Few Components are Relevant , 1994 .

[33]  Ker-Chau Li,et al.  Sliced Inverse Regression for Dimension Reduction , 1991 .

[34]  H. Tong,et al.  Article: 2 , 2002, European Financial Services Law.

[35]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[36]  R. Cook,et al.  Sufficient Dimension Reduction via Inverse Regression , 2005 .

[37]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[38]  R. Tibshirani,et al.  Sparse Principal Component Analysis , 2006 .

[39]  R. Cook,et al.  Reweighting to Achieve Elliptically Contoured Covariates in Regression , 1994 .

[40]  S. Weisberg,et al.  Comments on "Sliced inverse regression for dimension reduction" by K. C. Li , 1991 .

[41]  Danh V. Nguyen,et al.  Tumor classification by partial least squares using microarray gene expression data , 2002, Bioinform..

[42]  Gregory Piatetsky-Shapiro,et al.  High-Dimensional Data Analysis: The Curses and Blessings of Dimensionality , 2000 .

[43]  C Bernard-Michel,et al.  A note on sliced inverse regression with regularizations. , 2008, Biometrics.

[44]  Ker-Chau Li,et al.  On almost Linearity of Low Dimensional Projections from High Dimensional Data , 1993 .

[45]  Jianqing Fan,et al.  Sure independence screening for ultrahigh dimensional feature space , 2006, math/0612857.

[46]  Ker-Chau Li Sliced inverse regression for dimension reduction (with discussion) , 1991 .

[47]  Howard D. Bondell,et al.  Shrinkage inverse regression estimation for model‐free variable selection , 2009 .

[48]  R. Cook,et al.  Theory & Methods: Special Invited Paper: Dimension Reduction and Visualization in Discriminant Analysis (with discussion) , 2001 .

[49]  L. Penland,et al.  Use of a cDNA microarray to analyse gene expression patterns in human cancer , 1996, Nature Genetics.

[50]  Bing Li,et al.  Dimension reduction in regression without matrix inversion , 2007 .

[51]  Xiangrong Yin,et al.  Sliced Inverse Regression with Regularizations , 2008, Biometrics.

[52]  D. Reich,et al.  Population Structure and Eigenanalysis , 2006, PLoS genetics.

[53]  R. Cook,et al.  Dimension reduction for conditional mean in regression , 2002 .

[54]  Lixing Zhu,et al.  On Sliced Inverse Regression With High-Dimensional Covariates , 2006 .

[55]  Sophie Lambert-Lacroix,et al.  Effective dimension reduction methods for tumor classification using gene expression data , 2003, Bioinform..

[56]  Prasad A. Naik,et al.  Partial least squares estimator for single‐index models , 2000 .

[57]  K. Goldstein,et al.  Data-driven analysis approach for biomarker discovery using molecular-profiling technologies , 2005, Biomarkers : biochemical indicators of exposure, response, and susceptibility to chemicals.

[58]  Richard Baumgartner,et al.  Identification of central nervous system genes involved in the host response to the scrapie agent during preclinical and clinical infection. , 2004, The Journal of general virology.

[59]  R. H. Moore,et al.  Regression Graphics: Ideas for Studying Regressions Through Graphics , 1998, Technometrics.

[60]  R. Tibshirani,et al.  Least angle regression , 2004, math/0406456.

[61]  H. Tong,et al.  An adaptive estimation of dimension reduction space, with discussion , 2002 .