Novel Randomized Feature Selection Algorithms

Feature selection is the problem of identifying a subset of the most relevant features in the context of model construction. This problem has been well studied and plays a vital role in machine learning. In this paper we present three randomized algorithms for feature selection. They are generic in nature and can be applied for any learning algorithm. Proposed algorithms can be thought of as a random walk in the space of all possible subsets of the features. We demonstrate the generality of our approaches using three different applications. The simulation results show that our feature selection algorithms outperforms some of the best known algorithms existing in the current literature.

[1]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[3]  Aixia Guo,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2014 .

[4]  Sanguthevar Rajasekaran,et al.  Efficient algorithms for fast integration on large data sets from multiple sources , 2012, BMC Medical Informatics and Decision Making.

[5]  Ahmed K. Elmagarmid,et al.  Duplicate Record Detection: A Survey , 2007, IEEE Transactions on Knowledge and Data Engineering.

[6]  Anil K. Jain,et al.  Feature Selection: Evaluation, Application, and Small Sample Performance , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[7]  D. Mitra,et al.  Convergence and finite-time behavior of simulated annealing , 1985, 1985 24th IEEE Conference on Decision and Control.

[8]  Sanguthevar Rajasekaran,et al.  Efficient techniques for genotype‐phenotype correlational analysis , 2013, BMC Medical Informatics and Decision Making.

[9]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[10]  Elie Bienenstock,et al.  Neural Networks and the Bias/Variance Dilemma , 1992, Neural Computation.

[11]  Mark A. Hall,et al.  Correlation-based Feature Selection for Machine Learning , 2003 .

[12]  C. D. Gelatt,et al.  Optimization by Simulated Annealing , 1983, Science.

[13]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[14]  Sanguthevar Rajasekaran,et al.  On Simulated Annealing and Nested Annealing , 2000, J. Glob. Optim..

[15]  Rampi Ramprasad,et al.  Dielectric Properties of Carbon, Silicon and Germanium Based Polymers: A First Principles Study , 2012 .

[16]  Mineichi Kudo,et al.  Comparison of algorithms that select features for pattern classifiers , 2000, Pattern Recognit..

[17]  Josef Kittler,et al.  Floating search methods in feature selection , 1994, Pattern Recognit. Lett..

[18]  S. Boggs,et al.  The intrinsic electrical breakdown strength of insulators from first principles , 2012 .

[19]  Ron Kohavi,et al.  Feature Selection for Knowledge Discovery and Data Mining , 1998 .

[20]  S. Salzberg,et al.  Microbial gene identification using interpolated Markov models. , 1998, Nucleic acids research.

[21]  Alexander Gammerman,et al.  Ridge Regression Learning Algorithm in Dual Variables , 1998, ICML.

[22]  Keinosuke Fukunaga,et al.  A Branch and Bound Algorithm for Feature Subset Selection , 1977, IEEE Transactions on Computers.

[23]  Chih-Jen Lin,et al.  A comparison of methods for multiclass support vector machines , 2002, IEEE Trans. Neural Networks.

[24]  G. Wahba,et al.  Multicategory Support Vector Machines , Theory , and Application to the Classification of Microarray Data and Satellite Radiance Data , 2004 .

[25]  Peter Christen,et al.  Quality and Complexity Measures for Data Linkage and Deduplication , 2007, Quality Measures in Data Mining.

[26]  Thorsten Joachims,et al.  Transductive Inference for Text Classification using Support Vector Machines , 1999, ICML.

[27]  W. Winkler IMPROVED DECISION RULES IN THE FELLEGI-SUNTER MODEL OF RECORD LINKAGE , 1993 .

[28]  Jason Weston,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2002, Machine Learning.

[29]  Justin Doak,et al.  An evaluation of feature selection methods and their application to computer security , 1992 .

[30]  W. Winkler Overview of Record Linkage and Current Research Directions , 2006 .

[31]  Yi Lin Multicategory Support Vector Machines, Theory, and Application to the Classification of . . . , 2003 .

[32]  L. N. Kanal,et al.  Handbook of Statistics, Vol. 2. Classification, Pattern Recognition and Reduction of Dimensionality. , 1985 .

[33]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[34]  J. Kittler,et al.  Feature Set Search Alborithms , 1978 .

[35]  Ramamurthy Ramprasad,et al.  How critical are the van der Waals interactions in polymer crystals? , 2012, The journal of physical chemistry. A.

[36]  J. Stuart Aitken,et al.  Feature selection and classification for microarray data analysis: Evolutionary methods for identifying predictive genes , 2005, BMC Bioinformatics.

[37]  A. N. Tikhonov,et al.  Solutions of ill-posed problems , 1977 .

[38]  Sanguthevar Rajasekaran,et al.  A Novel Deterministic Sampling Technique to Speedup Clustering Algorithms , 2013, ADMA.

[39]  Sanguthevar Rajasekaran,et al.  A Greedy Correlation-Incorporated SVM-Based Algorithm for Gene Selection , 2007, 21st International Conference on Advanced Information Networking and Applications Workshops (AINAW'07).