Local Feature Selection for Data Classification

Typical feature selection methods choose an optimal global feature subset that is applied over all regions of the sample space. In contrast, in this paper we propose a novel localized feature selection (LFS) approach whereby each region of the sample space is associated with its own distinct optimized feature set, which may vary both in membership and size across the sample space. This allows the feature set to optimally adapt to local variations in the sample space. An associated method for measuring the similarities of a query datum to each of the respective classes is also proposed. The proposed method makes no assumptions about the underlying structure of the samples; hence the method is insensitive to the distribution of the data over the sample space. The method is efficiently formulated as a linear programming optimization problem. Furthermore, we demonstrate the method is robust against the over-fitting problem. Experimental results on eleven synthetic and real-world data sets demonstrate the viability of the formulation and the effectiveness of the proposed algorithm. In addition we show several examples where localized feature selection produces better results than a global feature selection method.

[1]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[2]  Yiu-ming Cheung,et al.  Feature Selection and Kernel Learning for Local Learning-Based Clustering , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Naftali Tishby,et al.  Margin based feature selection - theory and algorithms , 2004, ICML.

[4]  Inderjit S. Dhillon,et al.  Co-clustering documents and words using bipartite spectral graph partitioning , 2001, KDD '01.

[5]  George Mavrotas,et al.  Effective implementation of the epsilon-constraint method in Multi-Objective Mathematical Programming problems , 2009, Appl. Math. Comput..

[6]  C. Hwang,et al.  Fuzzy Multiple Objective Decision Making: Methods And Applications , 1996 .

[7]  James P. Reilly,et al.  Classification based on local feature selection via linear programming , 2013, 2013 IEEE International Workshop on Machine Learning for Signal Processing (MLSP).

[8]  Heng Tao Shen,et al.  Principal Component Analysis , 2009, Encyclopedia of Biometrics.

[9]  Igor Kononenko,et al.  Estimating Attributes: Analysis and Extensions of RELIEF , 1994, ECML.

[10]  I K Fodor,et al.  A Survey of Dimension Reduction Techniques , 2002 .

[11]  Larry A. Rendell,et al.  A Practical Approach to Feature Selection , 1992, ML.

[12]  J. Wade Davis,et al.  Statistical Pattern Recognition , 2003, Technometrics.

[13]  Stephen A. Billings,et al.  Feature Subset Selection and Ranking for Data Dimensionality Reduction , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[15]  George M. Church,et al.  Biclustering of Expression Data , 2000, ISMB.

[16]  Zhenqiu Liu,et al.  Sparse distance-based learning for simultaneous multiclass classification and feature selection of metagenomic data , 2011, Bioinform..

[17]  D. Donoho,et al.  Hessian eigenmaps: Locally linear embedding techniques for high-dimensional data , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[18]  Mikhail Belkin,et al.  Laplacian Eigenmaps for Dimensionality Reduction and Data Representation , 2003, Neural Computation.

[19]  Geoffrey E. Hinton,et al.  Pattern classification using a mixture of factor analyzers , 1999, Neural Networks for Signal Processing IX: Proceedings of the 1999 IEEE Signal Processing Society Workshop (Cat. No.98TH8468).

[20]  Bin Fang,et al.  Large Margin Subspace Learning for feature selection , 2013, Pattern Recognit..

[21]  Adel Al-Jumaily,et al.  Feature subset selection using differential evolution and a statistical repair mechanism , 2011, Expert Syst. Appl..

[22]  Arlindo L. Oliveira,et al.  Biclustering algorithms for biological data analysis: a survey , 2004, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[23]  Erkki Oja,et al.  Independent component analysis: algorithms and applications , 2000, Neural Networks.

[24]  Gavin Brown,et al.  Conditional Likelihood Maximisation: A Unifying Framework for Information Theoretic Feature Selection , 2012, J. Mach. Learn. Res..

[25]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[27]  C. Hwang Multiple Objective Decision Making - Methods and Applications: A State-of-the-Art Survey , 1979 .

[28]  P. Langley Selection of Relevant Features in Machine Learning , 1994 .

[29]  Yijun Sun,et al.  Iterative RELIEF for Feature Weighting: Algorithms, Theories, and Applications , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[30]  Shuicheng Yan,et al.  Neighborhood preserving embedding , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[31]  Zheng Bao,et al.  Large Margin Feature Weighting Method via Linear Programming , 2009, IEEE Transactions on Knowledge and Data Engineering.

[32]  Anil K. Jain,et al.  Statistical Pattern Recognition: A Review , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[33]  Feiping Nie,et al.  Efficient and Robust Feature Selection via Joint ℓ2, 1-Norms Minimization , 2010, NIPS.

[34]  P. B. Coaker,et al.  Applied Dynamic Programming , 1964 .

[35]  Sinisa Todorovic,et al.  Local-Learning-Based Feature Selection for High-Dimensional Data Analysis , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[36]  MavrotasGeorge Effective implementation of the ε-constraint method in Multi-Objective Mathematical Programming problems , 2009 .

[37]  Chong-Ho Choi,et al.  Input feature selection for classification problems , 2002, IEEE Trans. Neural Networks.

[38]  Lei Wang,et al.  Feature Selection with Kernel Class Separability , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[39]  R. Spang,et al.  Predicting the clinical status of human breast cancer by using gene expression profiles , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[40]  My T. Thai Approximation Algorithms : LP Relaxation , Rounding , and Randomized Rounding Techniques , 2008 .

[41]  J. Ross Beveridge,et al.  Grassmann Registration Manifolds for Face Recognition , 2008, ECCV.

[42]  U. Alon,et al.  Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. , 1999, Proceedings of the National Academy of Sciences of the United States of America.