Borrowing information from relevant microarray studies for sample classification using weighted partial least squares

With an increasing number of publicly available microarray datasets, it becomes attractive to borrow information from other relevant studies to have more reliable and powerful analysis of a given dataset. We do not assume that subjects in the current study and other relevant studies are drawn from the same population as assumed by meta-analysis. In particular, the set of parameters in the current study may be different from that of the other studies. We consider sample classification based on gene expression profiles in this context. We propose two new methods, a weighted partial least squares (WPLS) method and a weighted penalized partial least squares (WPPLS) method, to build a classifier by a combined use of multiple datasets. The methods can weight the individual datasets depending on their relevance to the current study. A more standard approach is first to build a classifier using each of the individual datasets, then to combine the outputs of the multiple classifiers using a weighted voting. Using two quite different datasets on human heart failure, we show first that WPLS/WPPLS, by borrowing information from the other dataset, can improve the performance of PLS/PPLS built on only a single dataset. Second, WPLS/WPPLS performs better than the standard approach of combining multiple classifiers. Third, WPPLS can improve over WPLS, just as PPLS does over PLS for a single dataset.

[1]  M. Newton Approximate Bayesian-inference With the Weighted Likelihood Bootstrap , 1994 .

[2]  Philip E. Gill,et al.  Practical optimization , 1981 .

[3]  Charles Wang,et al.  Multi-class tumor classification by discriminant partial least squares using microarray gene expression data and assessment of classification models , 2004, Comput. Biol. Chem..

[4]  William A. Smith,et al.  Mechanical Circulatory Support--a Long and Winding Road , 2002, Science.

[5]  Douglas M. Hawkins,et al.  Exploring Blood Spectra for Signs of Ovarian Cancer , 2003 .

[6]  H Smetana,et al.  The Permeability of the Renal Glomeruli of Several Mammalian Species to Labelled Proteins. , 1947, American Journal of Pathology.

[7]  W. Wong,et al.  On ψ-Learning , 2003 .

[8]  John Quackenbush,et al.  Orthologous gene-expression profiling in multi-species models: search for candidate genes , 2004, Genome Biology.

[9]  Fred A. Wright,et al.  Entropy and Survival-based Weights to Combine Affymetrix Array Types and Analyze Differential Expression and Survival , 2005 .

[10]  V. Jeevanandam,et al.  Altered myocardial phenotype after mechanical support in human beings with advanced cardiomyopathy. , 1997, The Journal of heart and lung transplantation : the official publication of the International Society for Heart Transplantation.

[11]  S. Wold,et al.  The Collinearity Problem in Linear Regression. The Partial Least Squares (PLS) Approach to Generalized Inverses , 1984 .

[12]  Wei Xin,et al.  Dysregulation of the annexin family protein family is associated with prostate cancer progression. , 2003, The American journal of pathology.

[13]  Danh V. Nguyen,et al.  Tumor classification by partial least squares using microarray gene expression data , 2002, Bioinform..

[14]  Jiang Gui,et al.  Threshold Gradient Descent Method for Censored Data Regression with Applications in Pharmacogenomics , 2004, Pacific Symposium on Biocomputing.

[15]  Sangsoo Kim,et al.  Integrative analysis of multiple gene expression profiles applied to liver cancer study , 2004, FEBS letters.

[16]  T. Barrette,et al.  Meta-analysis of microarrays: interstudy validation of gene expression profiles reveals pathway dysregulation in prostate cancer. , 2002, Cancer research.

[17]  D. Ghosh Penalized Discriminant Methods for the Classification of Tumors from Gene Expression Data , 2003, Biometrics.

[18]  Giovanni Parmigiani,et al.  A Cross-Study Comparison of Gene Expression Studies for the Molecular Classification of Lung Cancer , 2004, Clinical Cancer Research.

[19]  Kevin R. Coombes,et al.  Differences in gene expression between B-cell chronic lymphocytic leukemia and normal B cells: a meta-analysis of three microarray studies , 2004, Bioinform..

[20]  Wei Pan,et al.  Linear regression and two-class classification with gene expression data , 2003, Bioinform..

[21]  Xinqiang Han,et al.  Genomic profiling of the human heart before and after mechanical support with a ventricular assist device reveals alterations in vascular signaling networks. , 2004, Physiological genomics.

[22]  Ivo Grosse,et al.  Extreme Value Distribution Based Gene Selection Criteria for Discriminant Microarray Data Analysis Using Logistic Regression , 2004, J. Comput. Biol..

[23]  M C Oz,et al.  Transient normalization of systolic and diastolic function after support with a left ventricular assist device in a patient with dilated cardiomyopathy. , 1996, The Journal of heart and lung transplantation : the official publication of the International Society for Heart Transplantation.

[24]  Wei Pan,et al.  Modeling the relationship between LVAD support time and gene expression changes in the human heart by penalized partial least squares , 2004, Bioinform..

[25]  Bogdan E. Popescu,et al.  Gradient Directed Regularization for Linear Regression and Classi…cation , 2004 .

[26]  M C Oz,et al.  Long-term use of a left ventricular assist device for end-stage heart failure. , 2001, The New England journal of medicine.

[27]  Xiwu Lin,et al.  Making Sense of Human Lung Carcinomas Gene Expression Data: Integration and Analysis of Two Affymetrix Platform Experiments , 2005 .

[28]  Debashis Ghosh,et al.  Statistical issues and methods for meta-analysis of microarray data: a case study in prostate cancer , 2003, Functional & Integrative Genomics.

[29]  J. Zidek,et al.  Asymptotic properties of maximum weighted likelihood estimators , 2004 .

[30]  Jun Chen,et al.  Joint analysis of two microarray gene-expression data sets to select lung adenocarcinoma marker genes , 2004, BMC Bioinformatics.

[31]  Tapabrata Maiti,et al.  The use of the weighted likelihood in the natural exponential families with quadratic variance , 2004 .

[32]  Feifang Hu,et al.  The weighted likelihood , 2002 .

[33]  Jiang Gui,et al.  Partial Cox regression analysis for high-dimensional microarray gene expression data , 2004, ISMB/ECCB.

[34]  Wei Pan,et al.  A comparative study of discriminating human heart failure etiology using gene expression profiles , 2005, BMC Bioinformatics.

[35]  A. Boulesteix PLS Dimension Reduction for Classification with Microarray Data , 2004, Statistical applications in genetics and molecular biology.

[36]  Debashis Ghosh,et al.  Prognostic meta-signature of breast cancer developed by two-stage mixture modeling of microarray data , 2004, BMC Genomics.