Heuristic Search Algorithm for Dimensionality Reduction Optimally Combining Feature Selection and Feature Extraction

The following are two classical approaches to dimensionality reduction: 1. Approximating the data with a small number of features that exist in the data (feature selection). 2. Approximating the data with a small number of arbitrary features (feature extraction). We study a generalization that approximates the data with both selected and extracted features. We show that an optimal solution to this hybrid problem involves a combinatorial search, and cannot be trivially obtained even if one can solve optimally the separate problems of selection and extraction. Our approach that gives optimal and approximate solutions uses a “best first” heuristic search. The algorithm comes with both an a priori and an a posteriori optimality guarantee similar to those that can be obtained for the classical weighted A* algorithm. Experimental results show the effectiveness of the proposed approach.

[1]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[2]  Pascal Sarda,et al.  Factor models and variable selection in high-dimensional regression analysis , 2011 .

[3]  Nicolas Gillis,et al.  On the Complexity of Robust PCA and ℓ1-norm Low-Rank Matrix Approximation , 2015, Math. Oper. Res..

[4]  David P. Woodruff,et al.  Approximation Algorithms for l0-Low Rank Approximation , 2017, NIPS.

[5]  G. Golub,et al.  Linear least squares solutions by householder transformations , 1965 .

[6]  Jesfis Peral,et al.  Heuristics -- intelligent search strategies for computer problem solving , 1984 .

[7]  Wheeler Ruml,et al.  Faster than Weighted A*: An Optimistic Approach to Bounded Suboptimal Search , 2008, ICAPS.

[8]  David P. Woodruff,et al.  Algorithms for ℓp Low Rank Approximation , 2017 .

[9]  S. Muthukrishnan,et al.  Relative-Error CUR Matrix Decompositions , 2007, SIAM J. Matrix Anal. Appl..

[10]  Santosh S. Vempala,et al.  Matrix approximation and projective clustering via volume sampling , 2006, SODA '06.

[11]  Yaroslav Shitov,et al.  Column subset selection is NP-complete , 2017, Linear Algebra and its Applications.

[12]  Christos Boutsidis,et al.  An improved approximation algorithm for the column subset selection problem , 2008, SODA.

[13]  David P. Woodruff,et al.  Low rank approximation with entrywise l1-norm error , 2017, STOC.

[14]  Mark Tygert,et al.  Algorithm 971 , 2017, ACM transactions on mathematical software. Association for Computing Machinery.

[15]  Nathan Halko,et al.  Finding Structure with Randomness: Probabilistic Algorithms for Constructing Approximate Matrix Decompositions , 2009, SIAM Rev..

[16]  Hansheng Wang,et al.  Factor profiled sure independence screening , 2012 .

[17]  Haim Schweitzer,et al.  Optimal Column Subset Selection by A-Star Search , 2015, AAAI.

[18]  Venkatesan Guruswami,et al.  Optimal column-based low-rank matrix reconstruction , 2011, SODA.

[19]  Robert H. Halstead,et al.  Matrix Computations , 2011, Encyclopedia of Parallel Computing.

[20]  Ian T. Jolliffe,et al.  Principal Component Analysis , 2002, International Encyclopedia of Statistical Science.

[21]  Luis Rademacher,et al.  Efficient Volume Sampling for Row/Column Subset Selection , 2010, 2010 IEEE 51st Annual Symposium on Foundations of Computer Science.

[22]  Ming Gu,et al.  Efficient Algorithms for Computing a Strong Rank-Revealing QR Factorization , 1996, SIAM J. Sci. Comput..

[23]  Ying Wang,et al.  10, 000+ Times Accelerated Robust Subset Selection , 2015, AAAI.

[24]  Ke Xu,et al.  Unsupervised Feature Selection by Heuristic Search with Provable Bounds on Suboptimality , 2016, AAAI.