Sparsest Matrix based Random Projection for Classification

As a typical dimensionality reduction technique, random projection has been widely applied in a variety of fields concerning categorization. The construction of random projection has also been deeply studied, based on the principle of preserving the pairwise distances of a set of data projected from a high-dimensional space onto a low-dimensional subspace. Considering random projection is mainly exploited for the task of classification, this paper is novelly developed to study random projection from the viewpoint of feature selection, rather than of the traditional distance preservation. This yields a somewhat surprising result, that is, theoretically the sparsest random matrix with only one nonzero element in each column, can present better feature selection performance than other more dense matrices. Extensive experiments on binary classification also confirm the theoretical conjecture. Apparently, this result will be very attractive for dimensionality reduction due to its breakthrough on both complexity reduction and performance improvement.

[1]  Ata Kabán,et al.  Random Projections as Regularizers: Learning a Linear Discriminant Ensemble from Fewer Observations than Dimensions , 2013, ACML.

[2]  Carla E. Brodley,et al.  Random Projection for High Dimensional Data Clustering: A Cluster Ensemble Approach , 2003, ICML.

[3]  Piotr Indyk,et al.  Approximate nearest neighbors: towards removing the curse of dimensionality , 1998, STOC '98.

[4]  Santosh S. Vempala,et al.  An algorithmic theory of learning: Robust concepts and random projection , 1999, Machine Learning.

[5]  Kenneth Ward Church,et al.  Very sparse random projections , 2006, KDD '06.

[6]  David J. Kriegman,et al.  Acquiring linear subspaces for face recognition under variable lighting , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[8]  A. Martínez,et al.  The AR face databasae , 1998 .

[9]  Monson H. Hayes,et al.  Maximum likelihood training of the embedded HMM for face detection and recognition , 2000, Proceedings 2000 International Conference on Image Processing (Cat. No.00CH37101).

[10]  U. Alon,et al.  Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[11]  R. DeVore,et al.  A Simple Proof of the Restricted Isometry Property for Random Matrices , 2008 .

[12]  George Bebis,et al.  Face recognition experiments with random projection , 2005, SPIE Defense + Commercial Sensing.

[13]  David E. Misek,et al.  Gene-expression profiles predict survival of patients with lung adenocarcinoma , 2002, Nature Medicine.

[14]  R. Calderbank,et al.  Compressed Learning : Universal Sparse Dimensionality Reduction and Learning in the Measurement Domain , 2009 .

[15]  Dimitris Achlioptas,et al.  Database-friendly random projections: Johnson-Lindenstrauss with binary coins , 2003, J. Comput. Syst. Sci..

[16]  Anirban Dasgupta,et al.  A sparse Johnson: Lindenstrauss transform , 2010, STOC '10.

[17]  Aleix M. Martinez,et al.  The AR face database , 1998 .

[18]  J. Matousek,et al.  On variants of the Johnson–Lindenstrauss lemma , 2008 .

[19]  Andy Harter,et al.  Parameterisation of a stochastic model for human face identification , 1994, Proceedings of 1994 IEEE Workshop on Applications of Computer Vision.

[20]  David J. Kriegman,et al.  From Few to Many: Illumination Cone Models for Face Recognition under Variable Lighting and Pose , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[21]  Ata Kabán,et al.  Random projections as regularizers: learning a linear discriminant from fewer observations than dimensions , 2015, Machine Learning.

[22]  Harry Wechsler,et al.  The FERET database and evaluation procedure for face-recognition algorithms , 1998, Image Vis. Comput..

[23]  Avinash C. Kak,et al.  PCA versus LDA , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[24]  Anupam Gupta,et al.  An elementary proof of the Johnson-Lindenstrauss Lemma , 1999 .

[25]  W. B. Johnson,et al.  Extensions of Lipschitz mappings into Hilbert space , 1984 .

[26]  Deng Cai,et al.  Probabilistic dyadic data analysis with local and global consistency , 2009, ICML '09.