An Integrated Approach to Speed Up GA-SVM Feature Selection Model

Significant information or features are often overshadowed by noises and resulted in poor classification results. Feature selection methods such as GA-SVM are desirable in filtering out the irrelevant features and thus improve the accuracy; the selection itself might also offer critical insights into the problems. However, the high computational cost greatly discourages the application of GA-SVM, especially for large-scale datasets. In this paper, an HPC-enabled GA-SVM (HGA-SVM) is proposed and implemented by integrating data parallelization, multithreading and heuristic techniques with the ultimate goal of maintaining robustness and lowering computational cost. Our proposed model is comprised of four improvement strategies: 1) GA Parallelization, 2) SVM Parallelization, 3) Neighbor Search and 4) Evaluation Caching. All the four strategies improve the respective aspects of the feature selection algorithm and contribute collectively towards higher computational throughput.

[1]  Mancia Anguita,et al.  SCE Toolboxes for the Development of High-Level Parallel Applications , 2006, International Conference on Computational Science.

[2]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[3]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[4]  Christopher R. Houck,et al.  A Genetic Algorithm for Function Optimization: A Matlab Implementation , 2001 .

[5]  L. Dagum,et al.  OpenMP: an industry standard API for shared-memory programming , 1998 .

[6]  Stephen Gilmore,et al.  Combining Measurement and Stochastic Modelling to Enhance Scheduling Decisions for a Parallel Mean Value Analysis Algorithm , 2006, International Conference on Computational Science.

[7]  Kristin P. Bennett,et al.  A Pattern Search Method for Model Selection of Support Vector Regression , 2002, SDM.

[8]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[9]  N. Aronszajn Theory of Reproducing Kernels. , 1950 .

[10]  J. David Schaffer,et al.  Proceedings of the third international conference on Genetic algorithms , 1989 .

[11]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[12]  Reiko Tanese,et al.  Distributed Genetic Algorithms , 1989, ICGA.

[13]  Sayan Mukherjee,et al.  Choosing Multiple Parameters for Support Vector Machines , 2002, Machine Learning.

[14]  Ian T. Foster,et al.  Designing and building parallel programs - concepts and tools for parallel software engineering , 1995 .

[15]  Melanie Mitchell,et al.  An introduction to genetic algorithms , 1996 .

[16]  J. Platt Sequential Minimal Optimization : A Fast Algorithm for Training Support Vector Machines , 1998 .

[17]  Chih-Jen Lin,et al.  A Practical Guide to Support Vector Classication , 2008 .