Improved Comprehensibility and Reliability of Explanations via Restricted Halfspace Discretization

A number of two-class classification methods first discretize each attribute of two given training sets and then construct a propositional DNF formula that evaluates to True for one of the two discretized training sets and to False for the other one. The formula is not just a classification tool but constitutes a useful explanation for the differences between the two underlying populations if it can be comprehended by humans and is reliable. This paper shows that comprehensibility as well as reliability of the formulas can sometimes be improved using a discretization scheme where linear combinations of a small number of attributes are discretized.

[1]  Klaus Truemper,et al.  Learning Logic Formulas and Related Error Distributions , 2006 .

[2]  R. Casey,et al.  Advances in Pattern Recognition , 1971 .

[3]  Willi Klösgen,et al.  Explora: A Multipattern and Multistrategy Discovery Assistant , 1996, Advances in Knowledge Discovery and Data Mining.

[4]  Toshihide Ibaraki,et al.  An Implementation of Logical Analysis of Data , 2000, IEEE Trans. Knowl. Data Eng..

[5]  N. Cowan The magical number 4 in short-term memory: A reconsideration of mental storage capacity , 2001, Behavioral and Brain Sciences.

[6]  Huan Liu,et al.  Toward integrating feature selection algorithms for classification and clustering , 2005, IEEE Transactions on Knowledge and Data Engineering.

[7]  Nir Friedman,et al.  Discretizing Continuous Attributes While Learning Bayesian Networks , 1996, ICML.

[8]  William W. Cohen Fast Effective Rule Induction , 1995, ICML.

[9]  Sam Chao,et al.  Multivariate interdependent discretization for continuous attribute , 2005, Third International Conference on Information Technology and Applications (ICITA'05).

[10]  Toshihide Ibaraki,et al.  Logical analysis of numerical data , 1997, Math. Program..

[11]  Klaus Truemper,et al.  Transformation of Rational Data and Set Data to Logic Data , 2006 .

[12]  Fabrice Muhlenbach,et al.  Multivariate supervised discretization, a neighborhood graph approach , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[13]  Ron Kohavi,et al.  Error-Based and Entropy-Based Discretization of Continuous Features , 1996, KDD.

[14]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[15]  Peter Clark,et al.  Rule Induction with CN2: Some Recent Improvements , 1991, EWSL.

[16]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[17]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[18]  Gregory F. Cooper,et al.  Learning Hybrid Bayesian Networks from Data , 1999, Learning in Graphical Models.

[19]  Klaus Truemper,et al.  Discretization of Rational Data , 2008 .

[20]  Evangelos Triantaphyllou,et al.  Data Mining and Knowledge Discovery Approaches Based on Rule Induction Techniques , 2009 .

[21]  Huaiqing Wang,et al.  A discretization algorithm based on a heterogeneity criterion , 2005, IEEE Transactions on Knowledge and Data Engineering.

[22]  Nada Lavrac,et al.  Active subgroup mining: a case study in coronary heart disease risk group detection , 2003, Artif. Intell. Medicine.

[23]  Lukasz Kurgan,et al.  Data Mining and Knowledge Discovery Data Mining and Knowledge Discovery , 2002 .

[24]  Chun Zhang,et al.  Storing and querying ordered XML using a relational database system , 2002, SIGMOD '02.

[25]  Nada Lavrac,et al.  Induction of comprehensible models for gene expression datasets by subgroup discovery methodology , 2004, J. Biomed. Informatics.

[26]  Ron Kohavi,et al.  Supervised and Unsupervised Discretization of Continuous Features , 1995, ICML.

[27]  Nada Lavrac,et al.  Expert-Guided Subgroup Discovery: Methodology and Application , 2011, J. Artif. Intell. Res..

[28]  Peter A. Flach,et al.  Decision Support Through Subgroup Discovery: Three Case Studies and the Lessons Learned , 2004, Machine Learning.

[29]  Stefan Wrobel,et al.  An Algorithm for Multi-relational Discovery of Subgroups , 1997, PKDD.

[30]  Stephen D. Bay Multivariate discretization of continuous variables for set mining , 2000, KDD '00.

[31]  J. Bain,et al.  How Many Variables Can Humans Process? , 2005, Psychological science.

[32]  Syed Sibte Raza Abidi,et al.  Symbolic exposition of medical data-sets: a data mining workbench to inductively derive data-defining symbolic rules , 2002, Proceedings of 15th IEEE Symposium on Computer-Based Medical Systems (CBMS 2002).

[33]  Frank Puppe,et al.  Subgroup Mining for Interactive Knowledge Refinement , 2005, AIME.

[34]  Klaus Truemper,et al.  A MINSAT Approach for Learning in Logic Domains , 2002, INFORMS J. Comput..

[35]  G. A. Miller THE PSYCHOLOGICAL REVIEW THE MAGICAL NUMBER SEVEN, PLUS OR MINUS TWO: SOME LIMITS ON OUR CAPACITY FOR PROCESSING INFORMATION 1 , 1956 .

[36]  Usama M. Fayyad,et al.  Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning , 1993, IJCAI.

[37]  Ruoming Jin,et al.  Data Discretization Unification , 2007, ICDM.

[38]  Andrew K. C. Wong,et al.  A fuzzy approach to partitioning continuous attributes for classification , 2006, IEEE Transactions on Knowledge and Data Engineering.

[39]  Marc Boullé,et al.  Khiops: A Statistical Discretization Method of Continuous Attributes , 2004, Machine Learning.

[40]  Yoram Singer,et al.  A simple, fast, and effective rule learner , 1999, AAAI 1999.

[41]  Daphne Koller,et al.  Toward Optimal Feature Selection , 1996, ICML.

[42]  Petra Perner,et al.  Multi-interval Discretization Methods for Decision Tree Learning , 1998, SSPR/SPR.

[43]  Aijun An Learning classification rules from data , 2003 .

[44]  Marc Boullé,et al.  MODL: A Bayes optimal discretization method for continuous attributes , 2006, Machine Learning.

[45]  N. Cowan,et al.  Separating cognitive capacity from knowledge: a new hypothesis , 2007, Trends in Cognitive Sciences.

[46]  Jerzy W. Grzymala-Busse,et al.  Global discretization of continuous attributes as preprocessing for machine learning , 1996, Int. J. Approx. Reason..

[47]  Stephen D. Bay,et al.  Detecting Group Differences: Mining Contrast Sets , 2001, Data Mining and Knowledge Discovery.

[48]  Yann LeCun,et al.  Measuring the VC-Dimension of a Learning Machine , 1994, Neural Computation.

[49]  Geoffrey I. Webb,et al.  Proportional k-Interval Discretization for Naive-Bayes Classifiers , 2001, ECML.

[50]  Evangelos Triantaphyllou Data Mining and Knowledge Discovery via Logic-Based Methods: Theory, Algorithms, and Applications , 2010 .

[51]  Ruoming Jin,et al.  Data discretization unification , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[52]  Ning Zhong,et al.  Methodologies for Knowledge Discovery and Data Mining , 2002, Lecture Notes in Computer Science.

[53]  Evangelos Triantaphyllou,et al.  Data Mining and Knowledge Discovery via Logic-Based Methods , 2010 .

[54]  Nick Cercone,et al.  Discretization of Continuous Attributes for Learning Classification Rules , 1999, PAKDD.

[55]  Lukasz A. Kurgan,et al.  CAIM discretization algorithm , 2004, IEEE Transactions on Knowledge and Data Engineering.

[56]  Gregory F. Cooper,et al.  A Multivariate Discretization Method for Learning Bayesian Networks from Mixed Data , 1998, UAI.