Massively parallel feature selection: an approach based on variance preservation

Advances in computer technologies have enabled corporations to accumulate data at an unprecedented speed. Large-scale business data might contain billions of observations and thousands of features, which easily brings their scale to the level of terabytes. Most traditional feature selection algorithms are designed and implemented for a centralized computing architecture. Their usability significantly deteriorates when data size exceeds tens of gigabytes. High-performance distributed computing frameworks and protocols, such as the Message Passing Interface (MPI) and MapReduce, have been proposed to facilitate software development on grid infrastructures, enabling analysts to process large-scale problems efficiently. This paper presents a novel large-scale feature selection algorithm that is based on variance analysis. The algorithm selects features by evaluating their abilities to explain data variance. It supports both supervised and unsupervised feature selection and can be readily implemented in most distributed computing environments. The algorithm was implemented as a SAS High-Performance Analytics procedure, which can read data in distributed form and perform parallel feature selection in both symmetric multiprocessing mode (SMP) and massively parallel processing mode (MPP). Experimental results demonstrated the superior performance of the proposed method for large scale feature selection.

[1]  Michael I. Jordan,et al.  A Direct Formulation for Sparse Pca Using Semidefinite Programming , 2004, SIAM Rev..

[2]  Lawrence O. Hall,et al.  A Parallel Feature Selection Algorithm from Random Subsets , 2006 .

[3]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[4]  Huan Liu,et al.  Spectral Feature Selection for Data Mining , 2011 .

[5]  Mark A. Hall,et al.  Correlation-based Feature Selection for Machine Learning , 2003 .

[6]  Zheng Zhao,et al.  Massively Parallel Feature Selection: An Approach Based on Variance Preservation , 2012, ECML/PKDD.

[7]  Huan Liu,et al.  Feature selection for clustering - a filter solution , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[8]  B. G. Quinn,et al.  The determination of the order of an autoregression , 1979 .

[9]  R. Fisher THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[10]  Deng Cai,et al.  Laplacian Score for Feature Selection , 2005, NIPS.

[11]  Stan Matwin,et al.  Parallelizing Feature Selection , 2006, Algorithmica.

[12]  George Forman,et al.  An Extensive Empirical Study of Feature Selection Metrics for Text Classification , 2003, J. Mach. Learn. Res..

[13]  Bernhard Schölkopf,et al.  Use of the Zero-Norm with Linear Models and Kernel Methods , 2003, J. Mach. Learn. Res..

[14]  N. Sugiura Further analysts of the data by akaike' s information criterion and the finite corrections , 1978 .

[15]  Pedro Larrañaga,et al.  A review of feature selection techniques in bioinformatics , 2007, Bioinform..

[16]  Lei Wang,et al.  On Similarity Preserving Feature Selection , 2013, IEEE Transactions on Knowledge and Data Engineering.

[17]  David G. Stork,et al.  Pattern Classification , 1973 .

[18]  Kunle Olukotun,et al.  Map-Reduce for Machine Learning on Multicore , 2006, NIPS.

[19]  Alexey Lastovetsky,et al.  Experimental Study of Six Different Parallel Matrix-Matrix Multiplication Applications for Heterogeneous Computational Clusters of Multicore Processors , 2009 .

[20]  Ignacio Rojas,et al.  Efficient Parallel Feature Selection for Steganography Problems , 2009, IWANN.

[21]  Chris H. Q. Ding,et al.  Minimum redundancy feature selection from microarray gene expression data , 2003, Computational Systems Bioinformatics. CSB2003. Proceedings of the 2003 IEEE Bioinformatics Conference. CSB2003.

[22]  S. Manikandan,et al.  A Mathematical Approach for Feature Selection & Image Retrieval of Ultra Sound Kidney Image Databases , 2008 .

[23]  Carla E. Brodley,et al.  Feature Selection for Unsupervised Learning , 2004, J. Mach. Learn. Res..

[24]  Kaare Brandt Petersen,et al.  The Matrix Cookbook , 2006 .

[25]  Jieping Ye,et al.  Least squares linear discriminant analysis , 2007, ICML '07.

[26]  R. Tibshirani,et al.  Least angle regression , 2004, math/0406456.

[27]  R. Tibshirani,et al.  Sparse Principal Component Analysis , 2006 .

[28]  H. Akaike A new look at the statistical model identification , 1974 .

[29]  Jeremy Kubica,et al.  Parallel Large Scale Feature Selection for Logistic Regression , 2009, SDM.

[30]  Belén Melián-Batista,et al.  Solving feature subset selection problem by a Parallel Scatter Search , 2006, Eur. J. Oper. Res..

[31]  Kilian Q. Weinberger,et al.  Spectral Methods for Dimensionality Reduction , 2006, Semi-Supervised Learning.

[32]  Joseph K. Bradley,et al.  Parallel Coordinate Descent for L1-Regularized Loss Minimization , 2011, ICML.

[33]  Feiping Nie,et al.  Trace Ratio Criterion for Feature Selection , 2008, AAAI.

[34]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[35]  E. Wagenmakers,et al.  AIC model selection using Akaike weights , 2004, Psychonomic bulletin & review.

[36]  Le Song,et al.  Supervised feature selection via dependence estimation , 2007, ICML '07.

[37]  Huan Liu,et al.  Spectral feature selection for supervised and unsupervised learning , 2007, ICML '07.

[38]  Hiroshi Motoda,et al.  Feature Extraction, Construction and Selection: A Data Mining Perspective , 1998 .

[39]  G. Casella,et al.  Consistency of Bayesian procedures for variable selection , 2009, 0904.2978.

[40]  Marko Robnik-Sikonja,et al.  Theoretical and Empirical Analysis of ReliefF and RReliefF , 2003, Machine Learning.

[41]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[42]  Ron Kohavi,et al.  Feature Selection for Knowledge Discovery and Data Mining , 1998 .

[43]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[44]  El-Ghazali Talbi,et al.  Grid computing for parallel bioinspired algorithms , 2006, J. Parallel Distributed Comput..

[45]  Amos Storkey,et al.  Advances in Neural Information Processing Systems 20 , 2007 .

[46]  Ali A. Al-Subaihi Variable Selection in Multivariable Regression Using SAS/IML , 2002 .

[47]  Jack Dongarra,et al.  MPI: The Complete Reference , 1996 .

[48]  Heng Tao Shen,et al.  Principal Component Analysis , 2009, Encyclopedia of Biometrics.

[49]  Yuhong Yang,et al.  An Asymptotic Property of Model Selection Criteria , 1998, IEEE Trans. Inf. Theory.

[50]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[51]  Robert H. Halstead,et al.  Matrix Computations , 2011, Encyclopedia of Parallel Computing.

[52]  James J. Duderstadt The Future is Not What It Used To Be , 1996 .

[53]  Mohammed J. Zaki,et al.  Large-Scale Parallel Data Mining , 2002, Lecture Notes in Computer Science.

[54]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .