On node selection for classification in correlated data sets

Consider a system which can be in a finite number of states. Given a large number of characteristics which are measured, representing the system, we are concerned with the selection of a subset of characteristics of (small) given cardinality, for which the classification of the system according to one of the states in the state set is optimal according to the Rayleigh quotient criterion. This problem is relevant in various scenarios where a few explanatory variables have to be selected from a large set of candidates, including sensor selection in sensor networks, classification in image processing, and feature selection in data mining for bioinformatics applications. We show that the optimization amounts to finding the submatrix of the features covariance matrix for which the sum of elements of the inverse is maximized, and we present bounds which relate this optimization to a similar metric based on elements of the original covariance matrix.

[1]  Avrim Blum,et al.  Correlation Clustering , 2004, Machine Learning.

[2]  Ja-Chen Lin,et al.  A new LDA-based face recognition system which can solve the small sample size problem , 1998, Pattern Recognit..

[3]  R. Fisher THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[4]  Baltasar Beferull-Lozano,et al.  On network correlated data gathering , 2004, IEEE INFOCOM 2004.

[5]  D. Botstein,et al.  Singular value decomposition for genome-wide expression data processing and modeling. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[6]  Michal Linial,et al.  Novel Unsupervised Feature Filtering of Biological Data , 2006, ISMB.

[7]  Terence Tao,et al.  The Dantzig selector: Statistical estimation when P is much larger than n , 2005, math/0506081.

[8]  M. L. Pei A test matrix for inversion procedures , 1962, CACM.

[9]  Gordon Simons,et al.  Approximating the inverse of a symmetric positive definite matrix , 1998 .

[10]  E. Candès,et al.  The Dantzig selector: Statistical estimation when P is much larger than n , 2005, math/0506081.

[12]  Hua Yu,et al.  A direct LDA algorithm for high-dimensional data - with application to face recognition , 2001, Pattern Recognit..

[13]  Alistair I. Mees,et al.  Convergence of an annealing algorithm , 1986, Math. Program..

[14]  Simo Puntanen,et al.  Some comments on several matrix inequalities with applications to canonical correlations: Historical background and recent developments , 2002 .

[15]  Alexander Schrijver,et al.  Theory of linear and integer programming , 1986, Wiley-Interscience series in discrete mathematics and optimization.

[16]  P ? ? ? ? ? ? ? % ? ? ? ? , 1991 .

[17]  Lior Wolf,et al.  Combining variable selection with dimensionality reduction , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).