Logical analysis of data: classification with justification

Learning from examples is a frequently arising challenge, with a large number of algorithms proposed in the classification, data mining and machine learning literature. The evaluation of the quality of such algorithms is frequently carried out ex post, on an experimental basis: their performance is measured either by cross validation on benchmark data sets, or by clinical trials. Few of these approaches evaluate the learning process ex ante, on its own merits. In this paper, we discuss a property of rule-based classifiers which we call “justifiability”, and which focuses on the type of information extracted from the given training set in order to classify new observations. We investigate some interesting mathematical properties of justifiable classifiers. In particular, we establish the existence of justifiable classifiers, and we show that several well-known learning approaches, such as decision trees or nearest neighbor based methods, automatically provide justifiable classifiers. We also identify maximal subsets of observations which must be classified in the same way by every justifiable classifiers. Finally, we illustrate by a numerical example that using classifiers based on “most justifiable” rules does not seem to lead to overfitting, even though it involves an element of optimization.

[1]  Toshihide Ibaraki,et al.  Logical analysis of numerical data , 1997, Math. Program..

[2]  Peter L. Hammer,et al.  Logical Analysis of Data: From Combinatorial Optimization to Medical Applications , 2005 .

[3]  Toshihide Ibaraki,et al.  An Implementation of Logical Analysis of Data , 2000, IEEE Trans. Knowl. Data Eng..

[4]  Ying Liu,et al.  The Maximum Box Problem and its Application to Data Analysis , 2002, Comput. Optim. Appl..

[5]  N. D. Pidgen,et al.  The Comparative Method , 1987 .

[6]  Simon Kasif,et al.  A System for Induction of Oblique Decision Trees , 1994, J. Artif. Intell. Res..

[7]  Chid Apte,et al.  Proceedings of the 2007 SIAM International Conference on Data Mining , 2007 .

[8]  Jinyan Li,et al.  Efficient mining of emerging patterns: discovering trends and differences , 1999, KDD '99.

[9]  D. Kleitman,et al.  On Dedekind’s problem: The number of monotone Boolean functions , 1969 .

[10]  Yu. I. Zhuravlev,et al.  REALIZATION OF BOOLEAN FUNCTIONS WITH A SMALL NUMBER OF ZEROS BY DISJUNCTIVE NORMAL FORMS, AND RELATED PROBLEMS , 1985 .

[11]  Peter L. Hammer,et al.  Logical analysis of data—An overview: From combinatorial optimization to medical applications , 2006, Ann. Oper. Res..

[12]  Kotagiri Ramamohanarao,et al.  Using emerging patterns and decision trees in rare-class classification , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[13]  Graham Bell,et al.  A Comparative Method , 1989, The American Naturalist.

[14]  Bernhard Ganter,et al.  Formal Concept Analysis: Mathematical Foundations , 1998 .

[15]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[16]  Peter L. Hammer,et al.  Boolean Functions - Theory, Algorithms, and Applications , 2011, Encyclopedia of mathematics and its applications.

[17]  Claude Flament,et al.  L'analyse booléenne de questionnaires , 1976 .

[18]  Ronald L. Rivest,et al.  Constructing Optimal Binary Decision Trees is NP-Complete , 1976, Inf. Process. Lett..

[19]  A. Kogan,et al.  Disjunctive normal forms of Boolean functions with a small number of zeros , 1988 .

[20]  A. Kogan,et al.  On lower bounds for the complexity of disjunctive normal forms of Boolean functions with a small number of zeros , 1989 .

[21]  Peter L. Hammer,et al.  Maximum patterns in datasets , 2008, Discret. Appl. Math..

[22]  Saburo Muroga,et al.  Threshold logic and its applications , 1971 .

[23]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[24]  Vladimir Gurvich,et al.  Decomposability of Partially Defined Boolean Functions , 1995, Discret. Appl. Math..

[25]  Peter L. Hammer,et al.  A Branch-and-Bound Algorithm for a Family of Pseudo-Boolean Optimization Problems , 2007 .

[26]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, STOC '84.

[27]  Y. Crama,et al.  Cause-effect relationships and partially defined Boolean functions , 1988 .

[28]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[29]  David Haussler,et al.  Learning decision trees from random examples , 1988, COLT '88.

[30]  Toshihide Ibaraki,et al.  Error-Free and Best-Fit Extensions of Partially Defined Boolean Functions , 1998, Inf. Comput..

[31]  P. Hammer Large Margin LAD Classifiers , 2007 .

[32]  Srinivasan Parthasarathy,et al.  Proceedings of the Seventh SIAM International Conference on Data Mining, April 26-28, 2007, Minneapolis, Minnesota, USA , 2007, SDM.

[33]  Bernard Monjardet MATHÉMATIQUES ET SCIENCES HUMAINES , 1977 .

[34]  Peter L. Hammer,et al.  Pareto-optimal patterns in logical analysis of data , 2004, Discret. Appl. Math..

[35]  D. Angluin Queries and Concept Learning , 1988 .

[36]  Toshihide Ibaraki,et al.  Data Analysis by Positive Decision Trees , 1999, CODAS.

[37]  Rob Potharst,et al.  Monotone Decision Trees , 1997 .