A Parallel Computing Hybrid Approach for Feature Selection

The ultimate goal of feature selection is to select the smallest subset of features that yields minimum generalization error from an original set of features. This effectively reduces the feature space, and thus the complexity of classifiers. Though several algorithms have been proposed, no single one outperforms all the other in all scenarios, and the problem is still an actively researched field. This paper proposes a new hybrid parallel approach to perform feature selection. The idea is to use a filter metric to reduce feature space, and then use an innovative wrapper method to search extensively for the best solution. The proposed strategy is implemented on a shared memory parallel environment to speedup the process. We evaluated its parallel performance using up to 32 cores and our results show 30 times gain in speed. To test the performance of feature selection we used five datasets from the well known NIPS challenge and were able to obtain an average score of 95.90% for all solutions.

[1]  Chih-Jen Lin,et al.  A Practical Guide to Support Vector Classication , 2008 .

[2]  Audrey Mbogho,et al.  Selecting Relevant Features for Classifier Optimization , 2014, AMLTA.

[3]  Jouko Lampinen,et al.  Bayesian Input Variable Selection Using Posterior Probabilities and Expected Utilities , 2002 .

[4]  Jason Weston,et al.  Embedded Methods , 2006, Feature Extraction.

[5]  Yuni Xia,et al.  A Discretization Algorithm for Uncertain Data , 2010, DEXA.

[6]  Sukumar Bandopadhyay,et al.  An Objective Analysis of Support Vector Machine Based Classification for Remote Sensing , 2008 .

[7]  Leonard Kleinrock,et al.  Analysis of A time‐shared processor , 1964 .

[8]  Josef Kittler,et al.  Fast branch & bound algorithms for optimal feature selection , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Michael I. Jordan,et al.  Feature selection for high-dimensional genomic microarray data , 2001, ICML.

[10]  R. Boggia,et al.  Genetic algorithms as a strategy for feature selection , 1992 .

[11]  Hiroshi Sako,et al.  Comparison of genetic algorithm and sequential search methods for classifier subset selection , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[12]  Pedro M. Domingos A few useful things to know about machine learning , 2012, Commun. ACM.

[13]  Belén Melián-Batista,et al.  Solving feature subset selection problem by a Parallel Scatter Search , 2006, Eur. J. Oper. Res..

[14]  Silvia Casado Yusta,et al.  Different metaheuristic strategies to solve the feature selection problem , 2009, Pattern Recognit. Lett..

[15]  Richard Weber,et al.  A wrapper method for feature selection using Support Vector Machines , 2009, Inf. Sci..

[16]  Pablo A. Estévez,et al.  A review of feature selection methods based on mutual information , 2013, Neural Computing and Applications.

[17]  J. Stuart Aitken,et al.  Feature selection and classification for microarray data analysis: Evolutionary methods for identifying predictive genes , 2005, BMC Bioinformatics.

[18]  P. Cunningham,et al.  Solutions to Instability Problems with Sequential Wrapper-based Approaches to Feature Selection , 2002 .

[19]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[20]  Yafei Zhang,et al.  Dynamic Adaboost learning with feature selection based on parallel genetic algorithm for image annotation , 2010, Knowl. Based Syst..

[21]  Lluís A. Belanche Muñoz,et al.  Feature selection algorithms: a survey and experimental evaluation , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[22]  Huan Liu,et al.  Feature Selection for Classification: A Review , 2014, Data Classification: Algorithms and Applications.

[23]  Eric A. Hansen,et al.  Combining breadth-first and depth-first strategies in searching for treewidth , 2009, IJCAI 2009.

[24]  Vinodhini G. Chandrasekaran Performance Evaluation of Machine Learning Classifiers in Sentiment Mining , 2014, ArXiv.

[25]  Puneet Gupta,et al.  Beam search for feature selection in automatic SVM defect classification , 2002, Object recognition supported by user interaction for service robots.

[26]  Jennifer G. Dy,et al.  GPU-Accelerated Feature Selection for Outlier Detection Using the Local Kernel Density Ratio , 2012, 2012 IEEE 12th International Conference on Data Mining.

[27]  Sotiris B. Kotsiantis,et al.  Supervised Machine Learning: A Review of Classification Techniques , 2007, Informatica.

[28]  Wentian Li Mutual information functions versus correlation functions , 1990 .

[29]  Weixin Xie,et al.  A Novel Hybrid Feature Selection Method Based on IFSFFS and SVM for the Diagnosis of Erythemato-Squamous Diseases , 2010, WAPA.

[30]  P. Stenstrom A survey of cache coherence schemes for multiprocessors , 1990, Computer.

[31]  Vipin Kumar,et al.  Feature Selection: A literature Review , 2014, Smart Comput. Rev..

[32]  J. Manyika Big data: The next frontier for innovation, competition, and productivity , 2011 .

[33]  Huan Liu,et al.  Toward integrating feature selection algorithms for classification and clustering , 2005, IEEE Transactions on Knowledge and Data Engineering.

[34]  Ron Kohavi,et al.  A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection , 1995, IJCAI.

[35]  Aixia Guo,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2014 .

[36]  Mark D. Hill,et al.  Amdahl's Law in the Multicore Era , 2008, Computer.

[37]  Victoria J. Hodge,et al.  A Survey of Outlier Detection Methodologies , 2004, Artificial Intelligence Review.

[38]  Sungzoon Cho,et al.  GA-SVM wrapper approach for feature subset selection in keystroke dynamics identity verification , 2003, Proceedings of the International Joint Conference on Neural Networks, 2003..

[39]  Kevin P. Murphy,et al.  Machine learning - a probabilistic perspective , 2012, Adaptive computation and machine learning series.

[40]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[41]  Pedro Larrañaga,et al.  A review of feature selection techniques in bioinformatics , 2007, Bioinform..

[42]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[43]  Edwin Arnold,et al.  Chronic suprapubic catheterization in the management of patients with spinal cord injuries: analysis of upper and lower urinary tract complications , 2008, BJU International.