Feature selection based on inference correlation

Feature selection is a critical preprocessing step in machine learning. It contributes to cost-effective model building and improvement of model prediction performance. Generally, a feature selection algorithm requires a dependency measure and a search strategy. Extant dependency measures are mostly based on pair-wise correlation analysis, which cannot detect feature interaction. To overcome this problem, we developed a unified dependency criterion called inference correlation. The inference correlation between a set of predictor variables and a response variable can be efficiently calculated. The variables could be discrete, continuous, or mixed. Therefore, inference correlation can be applied to select features for both classification and regression problems. A feature selection algorithm using sequential floating forward search based on inference correlation is presented. Experiments of the algorithm on synthetic datasets and real-world problems confirm the effectiveness of the feature selection approach when compared to extant feature selection methods.

[1]  Paul A. Rubin,et al.  Feature Selection for Multiclass Discrimination via Mixed-Integer Linear Programming , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[2]  Anil K. Jain,et al.  Statistical Pattern Recognition: A Review , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[3]  Geoff Holmes,et al.  Benchmarking Attribute Selection Techniques for Discrete Class Data Mining , 2003, IEEE Trans. Knowl. Data Eng..

[4]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[6]  Chris H. Q. Ding,et al.  Minimum Redundancy Feature Selection from Microarray Gene Expression Data , 2005, J. Bioinform. Comput. Biol..

[7]  Pat Langley,et al.  Selection of Relevant Features and Examples in Machine Learning , 1997, Artif. Intell..

[8]  Carla E. Brodley,et al.  Feature Selection for Unsupervised Learning , 2004, J. Mach. Learn. Res..

[9]  Reza Modarres,et al.  A Cautionary Note on Estimating the Standard Error of the Gini Index of Inequality , 2006 .

[10]  Masoud Nikravesh,et al.  Feature Extraction: Foundations and Applications (Studies in Fuzziness and Soft Computing) , 2006 .

[11]  E. Parzen On Estimation of a Probability Density Function and Mode , 1962 .

[12]  R. Fisher THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[13]  P. Greenwood,et al.  A Guide to Chi-Squared Testing , 1996 .

[14]  Chong-Ho Choi,et al.  Input Feature Selection by Mutual Information Based on Parzen Window , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[15]  Ted Dunning,et al.  Accurate Methods for the Statistics of Surprise and Coincidence , 1993, CL.

[16]  Walter L. Ruzzo,et al.  Improved Gene Selection for Classification of Microarrays , 2002, Pacific Symposium on Biocomputing.

[17]  Huan Liu,et al.  Toward integrating feature selection algorithms for classification and clustering , 2005, IEEE Transactions on Knowledge and Data Engineering.

[18]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[19]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques with Java implementations , 2002, SGMD.

[20]  A. Kaplan,et al.  A Beginner's Guide to Partial Least Squares Analysis , 2004 .

[21]  Hiroshi Motoda,et al.  Computational Methods of Feature Selection , 2022 .

[22]  Ivan Bratko,et al.  Testing the significance of attribute interactions , 2004, ICML.

[23]  Ronald L. Rivest,et al.  Introduction to Algorithms , 1990 .

[24]  Igor Kononenko,et al.  On Biases in Estimating Multi-Valued Attributes , 1995, IJCAI.

[25]  R. Plackett,et al.  Karl Pearson and the Chi-squared Test , 1983 .

[26]  Huan Liu,et al.  Searching for Interacting Features , 2007, IJCAI.

[27]  Anil K. Jain,et al.  Feature Selection: Evaluation, Application, and Small Sample Performance , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[28]  Padhraic Smyth,et al.  Rule Induction Using Information Theory , 1991, Knowledge Discovery in Databases.

[29]  Philip J. Stone,et al.  Experiments in induction , 1966 .

[30]  W. Härdle,et al.  Applied Multivariate Statistical Analysis , 2003 .

[31]  M. Xiong,et al.  Biomarker Identification by Feature Wrappers , 2022 .

[32]  R Kahavi,et al.  Wrapper for feature subset selection , 1997 .

[33]  Ron Kohavi,et al.  Irrelevant Features and the Subset Selection Problem , 1994, ICML.

[34]  Igor Kononenko,et al.  Machine Learning and Data Mining: Introduction to Principles and Algorithms , 2007 .

[35]  Eugene Tuv,et al.  Ensembles of Regularized Least Squares Classifiers for High-Dimensional Problems , 2006, Feature Extraction.

[36]  Marko Robnik-Sikonja,et al.  Theoretical and Empirical Analysis of ReliefF and RReliefF , 2003, Machine Learning.