Iterative column subset selection

Dimensionality reduction is often a crucial step for the successful application of machine learning and data mining methods. One way to achieve said reduction is feature selection. Due to the impossibility of labelling many data sets, unsupervised approaches are frequently the only option. The column subset selection problem translates naturally to this purpose and has received considerable attention over the last few years, as it provides simple linear models for low-rank data reconstruction. Recently, it was empirically shown that an iterative algorithm, which can be implemented efficiently, provides better subsets than other state-of-the-art methods. In this paper, we describe this algorithm and provide a more in-depth analysis. We carry out numerous experiments to gain insights on its behaviour and derive a simple bound for the norm recovered by the resulting matrix. To the best of our knowledge, this is the first theoretical result of this kind for this algorithm.

[1]  C. A. Murthy,et al.  Unsupervised Feature Selection Using Feature Similarity , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[2]  L. Foster Rank and null space calculations using matrix decomposition without column interchanges , 1986 .

[3]  Mohamed S. Kamel,et al.  Distributed Column Subset Selection on MapReduce , 2013, 2013 IEEE 13th International Conference on Data Mining.

[4]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[5]  David J. Kriegman,et al.  From Few to Many: Illumination Cone Models for Face Recognition under Variable Lighting and Pose , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[6]  Aditya Bhaskara,et al.  Greedy Column Subset Selection: New Bounds and Distributed Algorithms , 2016, ICML.

[7]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Thierry Bertin-Mahieux,et al.  The Million Song Dataset , 2011, ISMIR.

[9]  Andy Harter,et al.  Parameterisation of a stochastic model for human face identification , 1994, Proceedings of 1994 IEEE Workshop on Applications of Computer Vision.

[10]  Krisztian Buza,et al.  Feedback Prediction for Blogs , 2012, GfKl.

[11]  Carla E. Brodley,et al.  Feature Selection for Unsupervised Learning , 2004, J. Mach. Learn. Res..

[12]  Petros Drineas,et al.  CUR matrix decompositions for improved data analysis , 2009, Proceedings of the National Academy of Sciences.

[13]  Deng Cai,et al.  Unsupervised feature selection for multi-cluster data , 2010, KDD.

[14]  Huan Liu,et al.  Spectral feature selection for supervised and unsupervised learning , 2007, ICML '07.

[15]  Haim Schweitzer,et al.  Optimal Column Subset Selection by A-Star Search , 2015, AAAI.

[16]  Mohamed S. Kamel,et al.  An Efficient Greedy Method for Unsupervised Feature Selection , 2011, 2011 IEEE 11th International Conference on Data Mining.

[17]  Michael J. Franklin,et al.  Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing , 2012, NSDI.

[18]  Ian T. Jolliffe,et al.  Principal Component Analysis , 2002, International Encyclopedia of Statistical Science.

[19]  Simon C. K. Shiu,et al.  Unsupervised feature selection by regularized self-representation , 2015, Pattern Recognit..

[20]  J. Meyer Generalized Inversion of Modified Matrices , 1973 .

[21]  Ming Gu,et al.  Efficient Algorithms for Computing a Strong Rank-Revealing QR Factorization , 1996, SIAM J. Sci. Comput..

[22]  Gene H. Golub,et al.  Numerical methods for solving linear least squares problems , 1965, Milestones in Matrix Computation.

[23]  Per Christian Hansen,et al.  Some Applications of the Rank Revealing QR Factorization , 1992, SIAM J. Sci. Comput..

[24]  Paulo Cortez,et al.  A Proactive Intelligent Decision Support System for Predicting the Popularity of Online News , 2015, EPIA.

[25]  Christos Boutsidis,et al.  Unsupervised feature selection for principal components analysis , 2008, KDD.

[26]  David J. Kriegman,et al.  Acquiring linear subspaces for face recognition under variable lighting , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27]  Venkatesan Guruswami,et al.  Optimal column-based low-rank matrix reconstruction , 2011, SODA.

[28]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[29]  Alberto Mozo,et al.  A Fast Iterative Algorithm for Improved Unsupervised Feature Selection , 2016, 2016 IEEE 16th International Conference on Data Mining (ICDM).

[30]  G. Golub,et al.  Linear least squares solutions by householder transformations , 1965 .

[31]  Malik Magdon-Ismail,et al.  Column subset selection via sparse approximation of SVD , 2012, Theor. Comput. Sci..

[32]  Gene H. Golub,et al.  Singular value decomposition and least squares solutions , 1970, Milestones in Matrix Computation.

[33]  Petros Drineas,et al.  Column Selection via Adaptive Sampling , 2015, NIPS.

[34]  Gene H. Golub,et al.  Matrix computations , 1983 .

[35]  Sameer A. Nene,et al.  Columbia Object Image Library (COIL100) , 1996 .

[36]  Christos Boutsidis,et al.  Near-Optimal Column-Based Matrix Reconstruction , 2014, SIAM J. Comput..

[37]  Christos Boutsidis,et al.  An improved approximation algorithm for the column subset selection problem , 2008, SODA.

[38]  Deng Cai,et al.  Laplacian Score for Feature Selection , 2005, NIPS.

[39]  T. Chan Rank revealing QR factorizations , 1987 .

[40]  Huan Liu,et al.  Efficient Feature Selection via Analysis of Relevance and Redundancy , 2004, J. Mach. Learn. Res..

[41]  Dimitris Papailiopoulos,et al.  Provable deterministic leverage score sampling , 2014, KDD.

[42]  Josef Kittler,et al.  Floating search methods in feature selection , 1994, Pattern Recognit. Lett..