Unsupervised Feature Selection by Heuristic Search with Provable Bounds on Suboptimality

Identifying a small number of features that can represent the data is a known problem that comes up in areas such as machine learning, knowledge representation, data mining, and numerical linear algebra. Computing an optimal solution is believed to be NP-hard, and there is extensive work on approximation algorithms. Classic approaches exploit the algebraic structure of the underlying matrix, while more recent approaches use randomization. An entirely different approach that uses the A* heuristic search algorithm to find an optimal solution was recently proposed. Not surprisingly it is limited to effectively selecting only a small number of features. We propose a similar approach related to the Weighted A* algorithm. This gives algorithms that are not guaranteed to find an optimal solution but run much faster than the A* approach, enabling effective selection of many features from large datasets. We demonstrate experimentally that these new algorithms are more accurate than the current state-of-the-art while still being practical. Furthermore, they come with an adjustable guarantee on how different their error may be from the smallest possible (optimal) error. Their accuracy can always be increased at the expense of a longer running time.

[1]  Haim Schweitzer,et al.  Optimal Column Subset Selection by A-Star Search , 2015, AAAI.

[2]  Christos Boutsidis,et al.  An improved approximation algorithm for the column subset selection problem , 2008, SODA.

[3]  Carla E. Brodley,et al.  Unsupervised Feature Selection Applied to Content-Based Retrieval of Lung Images , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[4]  Malik Magdon-Ismail,et al.  On selecting a maximum volume sub-matrix of a matrix and related problems , 2009, Theor. Comput. Sci..

[5]  Ameet Talwalkar,et al.  Sampling Methods for the Nyström Method , 2012, J. Mach. Learn. Res..

[6]  Jesfis Peral,et al.  Heuristics -- intelligent search strategies for computer problem solving , 1984 .

[7]  Luis O. Jimenez-Rodriguez,et al.  Unsupervised Linear Feature-Extraction Methods and Their Effects in the Classification of High-Dimensional Data , 2007, IEEE Transactions on Geoscience and Remote Sensing.

[8]  Haim Schweitzer,et al.  Pass-efficient unsupervised feature selection , 2013, NIPS.

[9]  Petros Drineas,et al.  Inferring Geographic Coordinates of Origin for Europeans Using Small Panels of Ancestry Informative Markers , 2010, PloS one.

[10]  Luis Rademacher,et al.  Efficient Volume Sampling for Row/Column Subset Selection , 2010, 2010 IEEE 51st Annual Symposium on Foundations of Computer Science.

[11]  Stuart J. Russell,et al.  Artificial Intelligence , 1999 .

[12]  Joel A. Tropp,et al.  Greed is good: algorithmic results for sparse approximation , 2004, IEEE Transactions on Information Theory.

[13]  Maoguo Gong,et al.  Unsupervised Hyperspectral Image Band Selection via Column Subset Selection , 2015, IEEE Geoscience and Remote Sensing Letters.

[14]  Malik Magdon-Ismail,et al.  Column subset selection via sparse approximation of SVD , 2012, Theor. Comput. Sci..

[15]  Anirban Dasgupta,et al.  Feature selection methods for text classification , 2007, KDD '07.

[16]  Alan M. Frieze,et al.  Fast Monte-Carlo algorithms for finding low-rank approximations , 1998, Proceedings 39th Annual Symposium on Foundations of Computer Science (Cat. No.98CB36280).

[17]  Haim Schweitzer,et al.  Improved Greedy Algorithms for Sparse Approximation of a Matrix in Terms of Another Matrix , 2015, IEEE Transactions on Knowledge and Data Engineering.

[18]  Ali Çivril,et al.  Column Subset Selection Problem is UG-hard , 2014, J. Comput. Syst. Sci..

[19]  Jing Liu,et al.  Unsupervised Feature Selection Using Nonnegative Spectral Analysis , 2012, AAAI.

[20]  Judea Pearl,et al.  Heuristics : intelligent search strategies for computer problem solving , 1984 .

[21]  Ming Gu,et al.  Efficient Algorithms for Computing a Strong Rank-Revealing QR Factorization , 1996, SIAM J. Sci. Comput..

[22]  Sebastian Thrun,et al.  ARA*: Anytime A* with Provable Bounds on Sub-Optimality , 2003, NIPS.

[23]  David G. Stork,et al.  Pattern Classification , 1973 .

[24]  Santosh S. Vempala,et al.  Matrix approximation and projective clustering via volume sampling , 2006, SODA '06.

[25]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[26]  Venkatesan Guruswami,et al.  Optimal column-based low-rank matrix reconstruction , 2011, SODA.

[27]  Robert H. Halstead,et al.  Matrix Computations , 2011, Encyclopedia of Parallel Computing.

[28]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[29]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[30]  S. Billings,et al.  Feature Subset Selection and Ranking for Data Dimensionality Reduction , 2007 .