A Sparse Learning Machine for High-Dimensional Data with Application to Microarray Gene Analysis

Extracting features from high-dimensional data is a critically important task for pattern recognition and machine learning applications. High-dimensional data typically have much more variables than observations, and contain significant noise, missing components, or outliers. Features extracted from high-dimensional data need to be discriminative, sparse, and can capture essential characteristics of the data. In this paper, we present a way to constructing multivariate features and then classify the data into proper classes. The resulting small subset of features is nearly the best in the sense of Greenshtein's persistence; however, the estimated feature weights may be biased. We take a systematic approach for correcting the biases. We use conjugate gradient-based primal-dual interior-point techniques for large-scale problems. We apply our procedure to microarray gene analysis. The effectiveness of our method is confirmed by experimental results.

[1]  Nello Cristianini,et al.  Support vector machine classification and validation of cancer tissue samples using microarray expression data , 2000, Bioinform..

[2]  Peter J. Rousseeuw,et al.  Robust regression and outlier detection , 1987 .

[3]  S. Dudoit,et al.  Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data , 2002 .

[4]  R. Tibshirani,et al.  Diagnosis of multiple cancer types by shrunken centroids of gene expression , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[5]  Wei Pan,et al.  Linear regression and two-class classification with gene expression data , 2003, Bioinform..

[6]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[7]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[8]  Sanja Fidler,et al.  Combining reconstructive and discriminative subspace methods for robust classification and regression by subsampling , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Jianqing Fan,et al.  High Dimensional Classification Using Features Annealed Independence Rules. , 2007, Annals of statistics.

[10]  Ker-Chau Li Sliced inverse regression for dimension reduction (with discussion) , 1991 .

[11]  Xiaoming Huo,et al.  Uncertainty principles and ideal atomic decomposition , 2001, IEEE Trans. Inf. Theory.

[12]  Runze Li,et al.  Statistical Challenges with High Dimensionality: Feature Selection in Knowledge Discovery , 2006, math/0602133.

[13]  Heng Tao Shen,et al.  Principal Component Analysis , 2009, Encyclopedia of Biometrics.

[14]  Avinash C. Kak,et al.  PCA versus LDA , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[15]  B. Kowalski,et al.  Partial least-squares regression: a tutorial , 1986 .

[16]  Ruth M. Pfeiffer,et al.  Graphical Methods for Class Prediction Using Dimension Reduction Techniques on DNA Microarray Data , 2003, Bioinform..

[17]  E. Greenshtein Best subset selection, persistence in high-dimensional statistical learning and optimization under l1 constraint , 2006, math/0702684.

[18]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[19]  P. Bickel,et al.  Some theory for Fisher''s linear discriminant function , 2004 .

[20]  Debashis Ghosh,et al.  Singular Value Decomposition Regression Models for Classification of Tumors from Microarray Experiments , 2001, Pacific Symposium on Biocomputing.

[21]  J. Welsh,et al.  Analysis of gene expression identifies candidate markers and pharmacological targets in prostate cancer. , 2001, Cancer research.

[22]  Dean P. Foster,et al.  The risk inflation criterion for multiple regression , 1994 .

[23]  Danh V. Nguyen,et al.  Tumor classification by partial least squares using microarray gene expression data , 2002, Bioinform..

[24]  R. Tibshirani,et al.  Prediction by Supervised Principal Components , 2006 .

[25]  Bernhard Schölkopf,et al.  A Direct Method for Building Sparse Kernel Learning Algorithms , 2006, J. Mach. Learn. Res..

[26]  Ron Kohavi,et al.  Feature Selection for Knowledge Discovery and Data Mining , 1998 .

[27]  Peter J. Rousseeuw,et al.  Robust Regression and Outlier Detection , 2005, Wiley Series in Probability and Statistics.

[28]  Michael Elad,et al.  A generalized uncertainty principle and sparse representation in pairs of bases , 2002, IEEE Trans. Inf. Theory.

[29]  M. Barker,et al.  Partial least squares for discrimination , 2003 .

[30]  E. Lander,et al.  Gene expression correlates of clinical prostate cancer behavior. , 2002, Cancer cell.

[31]  D. Donoho For most large underdetermined systems of linear equations the minimal 𝓁1‐norm solution is also the sparsest solution , 2006 .

[32]  Li Shen,et al.  PLS and SVD based penalized logistic regression for cancer classification using microarray data , 2005, APBC.

[33]  Sophie Lambert-Lacroix,et al.  Effective dimension reduction methods for tumor classification using gene expression data , 2003, Bioinform..

[34]  Robert Tibshirani,et al.  1-norm Support Vector Machines , 2003, NIPS.

[35]  Ker-Chau Li,et al.  Sliced Inverse Regression for Dimension Reduction , 1991 .

[36]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[37]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[38]  F. Chiaromonte,et al.  Dimension reduction strategies for analyzing global gene expression data with a response. , 2002, Mathematical biosciences.