Quantifying and Visualizing Attribute Interactions

Interactions are patterns between several attributes in data that cannot be inferred from any subset of these attributes. While mutual information is a well-established approach to evaluating the interactions between two attributes, we surveyed its generalizations as to quantify interactions between several attributes. We have chosen McGill's interaction information, which has been independently rediscovered a number of times under various names in various disciplines, because of its many intuitively appealing properties. We apply interaction information to visually present the most important interactions of the data. Visualization of interactions has provided insight into the structure of data on a number of domains, identifying redundant attributes and opportunities for constructing new features, discovering unexpected regularities in data, and have helped during construction of predictive models; we illustrate the methods on numerous examples. A machine learning method that disregards interactions may get caught in two traps: myopia is caused by learning algorithms assuming independence in spite of interactions, whereas fragmentation arises from assuming an interaction in spite of independence.

[1]  J. Morgan,et al.  Problems in the Analysis of Survey Data, and a Proposal , 1963 .

[2]  C. Judd,et al.  Statistical difficulties of detecting interactions and moderator effects. , 1993, Psychological bulletin.

[3]  Eduardo Perez Learning despite complex attribute interaction: an approach based on relational operators , 1997 .

[4]  Huaiyu Zhu On Information and Sufficiency , 1997 .

[5]  Igor Kononenko,et al.  Semi-Naive Bayesian Classifier , 1991, EWSL.

[6]  L. Breiman Random Forests--random Features , 1999 .

[7]  Michael Satosi Watanabe,et al.  Information Theoretical Analysis of Multivariate Correlation , 1960, IBM J. Res. Dev..

[8]  J. Kirkwood,et al.  The Radial Distribution Function in Liquids , 1942 .

[9]  Matsuda,et al.  Physical nature of higher-order mutual information: intrinsic correlations and frustration , 2000, Physical review. E, Statistical physics, plasmas, fluids, and related interdisciplinary topics.

[10]  G. Yule On the Theory of Correlation for any Number of Variables, Treated by a New System of Notation , 1907 .

[11]  Steven W. Norton,et al.  Generating Better Decision Trees , 1989, IJCAI.

[12]  A. J. Bell THE CO-INFORMATION LATTICE , 2003 .

[13]  Ricardo Vilalta,et al.  A Decomposition of Classes via Clustering to Explain and Improve Naive Bayes , 2003, ECML.

[14]  Charles X. Ling,et al.  Learnability of Augmented Naive Bayes in Nonimal Domains , 2001, ICML.

[15]  Usama M. Fayyad,et al.  Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning , 1993, IJCAI.

[16]  Larry A. Rendell,et al.  Lookahead Feature Construction for Learning Hard Concepts , 1993, International Conference on Machine Learning.

[17]  Daphne Koller,et al.  Toward Optimal Feature Selection , 1996, ICML.

[18]  Charles R. Meyer,et al.  Multi-variate Mutual Information for Registration , 1999, MICCAI.

[19]  Michael J. Pazzani,et al.  Searching for Dependencies in Bayesian Classifiers , 1995, AISTATS.

[20]  D. Haussler,et al.  Boolean Feature Discovery in Empirical Learning , 1990, Machine Learning.

[21]  Eibe Frank,et al.  Logistic Model Trees , 2003, ECML.

[22]  Ramón López de Mántaras,et al.  A distance-based attribute selection measure for decision tree induction , 1991, Machine Learning.

[23]  William J. McGill Multivariate information transmission , 1954, Trans. IRE Prof. Group Inf. Theory.

[24]  H. Blalock,et al.  Theory Construction: From Verbal to Mathematical Formulations. , 1970 .

[25]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[26]  Aleks Jakulin,et al.  Attribute Interactions in Machine Learning , 2003 .

[27]  J. Jaccard Interaction Effects in Factorial Analysis of Variance , 1997 .

[28]  G Tononi,et al.  Theoretical neuroanatomy: relating anatomical and functional connectivity in graphs and cortical connection matrices. , 2000, Cerebral cortex.

[29]  Marko Robnik-Sikonja,et al.  Overcoming the Myopia of Inductive Learning Algorithms with RELIEFF , 2004, Applied Intelligence.

[30]  V. Vedral The role of relative entropy in quantum information theory , 2001, quant-ph/0102094.

[31]  M. Bartlett Contingency Table Interactions , 1935 .

[32]  László Orlóci,et al.  Biodiversity analysis: issues, concepts, techniques , 2002 .

[33]  Ivan Bratko,et al.  Attribute Interactions in Medical Data Analysis , 2003, AIME.

[34]  Irving John Good,et al.  The Estimation of Probabilities: An Essay on Modern Bayesian Methods , 1965 .

[35]  Mark A. Hall,et al.  Correlation-based Feature Selection for Discrete and Numeric Class Machine Learning , 1999, ICML.

[36]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[37]  Donald Michie,et al.  Problem Decomposition and the Learning of Skills , 1995, ECML.

[38]  T. Tsujishita,et al.  On Triple Mutual Information , 1994 .

[40]  Alen D. Shapiro,et al.  Structured induction in expert systems , 1987 .

[41]  Michel Grabisch,et al.  An axiomatic approach to the concept of interaction among players in cooperative games , 1999, Int. J. Game Theory.

[42]  Ivan Bratko,et al.  Learning by Discovering Concept Hierarchies , 1999, Artif. Intell..

[43]  Matthias Schroder,et al.  Logistic Regression: A Self-Learning Text , 2003 .

[44]  J. Darroch Multiplicative and additive interaction in contingency tables , 1974 .

[45]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[46]  D. A. Kenny,et al.  The moderator-mediator variable distinction in social psychological research: conceptual, strategic, and statistical considerations. , 1986, Journal of personality and social psychology.

[47]  H. O. Lancaster The chi-squared distribution , 1971 .

[48]  Thomas Wennekers,et al.  Spatial and temporal stochastic interaction in neuronal assemblies , 2003, Theory in Biosciences.

[49]  Raymond W. Yeung,et al.  A new outlook of Shannon's information measures , 1991, IEEE Trans. Inf. Theory.

[50]  David A. Bell,et al.  Learning Bayesian networks from data: An information-theory based approach , 2002, Artif. Intell..

[51]  Ron Kohavi,et al.  Scaling Up the Accuracy of Naive-Bayes Classifiers: A Decision-Tree Hybrid , 1996, KDD.

[52]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems , 1988 .

[53]  Larry A. Rendell,et al.  Global Data Analysis and the Fragmentation Problem in Decision Tree Induction , 1997, ECML.

[54]  Kathryn B. Laskey,et al.  Neural Coding: Higher-Order Temporal Patterns in the Neurostatistics of Cell Assemblies , 2000, Neural Computation.

[55]  Marko Robnik-Sikonja,et al.  Theoretical and Empirical Analysis of ReliefF and RReliefF , 2003, Machine Learning.

[56]  C. Rajski,et al.  A Metric Space of Discrete Probability Distributions , 1961, Inf. Control..

[57]  N. J. Cerf,et al.  Entropic Bell inequalities , 1997 .

[58]  Alex Alves Freitas,et al.  Understanding the Crucial Role of Attribute Interaction in Data Mining , 2001, Artificial Intelligence Review.

[59]  Trevor J. Hastie,et al.  Discriminative vs Informative Learning , 1997, KDD.

[60]  Nicolette de Keizer,et al.  Integrating classification trees with local logistic regression in Intensive Care prognosis , 2003, Artif. Intell. Medicine.

[61]  Gal Chechik,et al.  Group Redundancy Measures Reveal Redundancy Reduction in the Auditory Pathway , 2001, NIPS.

[62]  William Bialek,et al.  Synergy in a Neural Code , 2000, Neural Computation.

[63]  Brendan J. Frey,et al.  Factor graphs and the sum-product algorithm , 2001, IEEE Trans. Inf. Theory.

[64]  David J. C. MacKay,et al.  Information Theory, Inference, and Learning Algorithms , 2004, IEEE Transactions on Information Theory.

[65]  Andrew K. C. Wong,et al.  Typicality, Diversity, and Feature Pattern of an Ensemble , 1975, IEEE Transactions on Computers.

[66]  Nir Friedman,et al.  Bayesian Network Classifiers , 1997, Machine Learning.

[67]  Xintao Wu,et al.  Screening and interpreting multi-item associations based on log-linear modeling , 2003, KDD '03.

[68]  Aleks Jakulin Attribute interactions in machine learning : master's thesis , 2002 .

[69]  Huan Liu,et al.  Fragmentation problem and automated feature construction , 1998, Proceedings Tenth IEEE International Conference on Tools with Artificial Intelligence (Cat. No.98CH36294).

[70]  Larry A. Rendell,et al.  A Practical Approach to Feature Selection , 1992, ML.

[71]  Te Sun Han,et al.  Multiple Mutual Informations and Multiple Interactions in Frequency Data , 1980, Inf. Control..

[72]  Ivan Bratko,et al.  Analyzing Attribute Dependencies , 2003, PKDD.

[73]  Thomas M. Cover,et al.  Elements of Information Theory (Wiley Series in Telecommunications and Signal Processing) , 2006 .

[74]  L. A. Goodman,et al.  Measures of association for cross classifications , 1979 .

[75]  Marvin A. Kastenbaum,et al.  On the Hypothesis of No "Interaction" In a Multi-way Contingency Table , 1956 .

[76]  Stephen D. Bay Multivariate Discretization for Set Mining , 2001, Knowledge and Information Systems.

[77]  Ivo Düntsch,et al.  On Model Evaluation, Indexes of Importance, and Interaction Values in Rough Set Analysis , 2004, Rough-Neural Computing: Techniques for Computing with Words.

[78]  M. Studený,et al.  The Multiinformation Function as a Tool for Measuring Stochastic Dependence , 1998, Learning in Graphical Models.

[79]  Pedro M. Domingos,et al.  On the Optimality of the Simple Bayesian Classifier under Zero-One Loss , 1997, Machine Learning.