Searching for interacting features in subset selection

The evolving and adapting capabilities of robust intelligence are best manifested in its ability to learn. Machine learning enables computer systems to learn, and improve performance. Feature selection facilitates machine learning (e.g., classification) by aiming to remove irrelevant features. Feature (attribute) interaction presents a challenge to feature subset selection for classification. This is because a feature by itself might have little correlation with the target concept, but when it is combined with some other features, they can be strongly correlated with the target concept. Thus, the unintentional removal of these features may result in poor classification performance. It is computationally intractable to handle feature interactions in general. However, the presence of feature interaction in a wide range of real-world applications demands practical solutions that can reduce high-dimensional data while preserving feature interactions. In this paper, we take up the challenge to design a special data structure for feature quality evaluation, and to employ an information-theoretic feature ranking mechanism to efficiently handle feature interaction in subset selection. We conduct experiments to evaluate our approach by comparing with some representative methods, perform a lesion study to examine the critical components of the proposed algorithm to gain insights, and investigate related issues such as data structure, ranking, time complexity, and scalability in search of interacting features.

[1]  Huan Liu,et al.  A Monotonic Measure for Optimal Feature Selection , 1998, ECML.

[2]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .

[3]  Ash A. Alizadeh,et al.  Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling , 2000, Nature.

[4]  Huan Liu,et al.  Feature Selection for Classification , 1997, Intell. Data Anal..

[5]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[6]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, STOC '84.

[7]  F. Fleuret Fast Binary Feature Selection with Conditional Mutual Information , 2004, J. Mach. Learn. Res..

[8]  Mark A. Hall,et al.  Correlation-based Feature Selection for Discrete and Numeric Class Machine Learning , 1999, ICML.

[9]  Hiroshi Motoda,et al.  Computational Methods of Feature Selection , 2007 .

[10]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[11]  Aleks Jakulin Machine Learning Based on Attribute Interactions , 2005 .

[12]  Thomas G. Dietterich,et al.  Learning Boolean Concepts in the Presence of Many Irrelevant Features , 1994, Artif. Intell..

[13]  Pat Langley,et al.  Selection of Relevant Features and Examples in Machine Learning , 1997, Artif. Intell..

[14]  Rish,et al.  An analysis of data characteristics that affect naive Bayes performance , 2001 .

[15]  Daphne Koller,et al.  Toward Optimal Feature Selection , 1996, ICML.

[16]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[17]  Igor Kononenko,et al.  Estimating Attributes: Analysis and Extensions of RELIEF , 1994, ECML.

[18]  Ron Kohavi,et al.  Irrelevant Features and the Subset Selection Problem , 1994, ICML.

[19]  Lei Xu,et al.  Best first strategy for feature selection , 1988, [1988 Proceedings] 9th International Conference on Pattern Recognition.

[20]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[21]  Ian Witten,et al.  Data Mining , 2000 .

[22]  Usama M. Fayyad,et al.  Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning , 1993, IJCAI.

[23]  Naftali Tishby,et al.  Margin based feature selection - theory and algorithms , 2004, ICML.

[24]  David A. Bell,et al.  A Formalism for Relevance and Its Application in Feature Subset Selection , 2000, Machine Learning.

[25]  Marko Robnik-Sikonja,et al.  Theoretical and Empirical Analysis of ReliefF and RReliefF , 2003, Machine Learning.

[26]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[27]  Huan Liu,et al.  Feature Selection for High-Dimensional Data: A Fast Correlation-Based Filter Solution , 2003, ICML.

[28]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[29]  Huan Liu,et al.  Consistency-based search in feature selection , 2003, Artif. Intell..

[30]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques with Java implementations , 2002, SGMD.

[31]  Estevam R. Hruschka,et al.  Bayesian Feature Selection for Clustering Problems , 2006, J. Inf. Knowl. Manag..

[32]  Umesh V. Vazirani,et al.  An Introduction to Computational Learning Theory , 1994 .

[33]  David G. Stork,et al.  Pattern Classification , 1973 .

[34]  Weiru Liu,et al.  Learning belief networks from data: an information theory based approach , 1997, CIKM '97.

[35]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[36]  Huan Liu,et al.  Toward integrating feature selection algorithms for classification and clustering , 2005, IEEE Transactions on Knowledge and Data Engineering.

[37]  Dimitris Margaritis,et al.  Speculative Markov blanket discovery for optimal feature selection , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[38]  Ivan Bratko,et al.  Testing the significance of attribute interactions , 2004, ICML.

[39]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[40]  L. Ryd,et al.  On bias. , 1994, Acta orthopaedica Scandinavica.

[41]  Mee Young Park,et al.  Regularization Path Algorithms for Detecting Gene Interactions , 2006 .

[42]  Sebastian Thrun,et al.  The MONK''s Problems-A Performance Comparison of Different Learning Algorithms, CMU-CS-91-197, Sch , 1991 .

[43]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[44]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[45]  Michael J. Pazzani,et al.  Searching for Dependencies in Bayesian Classifiers , 1995, AISTATS.

[46]  Ivan Bratko,et al.  Analyzing Attribute Dependencies , 2003, PKDD.