Calibrated Lazy Associative Classification

Classification is an important problem in data mining. Given an example x and a class c, a classifier usually works by estimating the probability of x being member of c (i.e., membership probability). Well calibrated classifiers are those able to provide accurate estimates of class membership probabilities, that is, the estimated probability p(c|x) is close to p(c|p(c|x)), which is the true, empirical probability of x being member of c given that the probability estimated by the classifier is p(c|x). Calibration is not a necessary property for producing accurate classifiers, and thus, most of the research has focused on direct accuracy maximization strategies (i.e., maximum margin) rather than on calibration. However, non-calibrated classifiers are problematic in applications where the reliability associated with a prediction must be taken into account (i.e., cost-sensitive classification, cautious classification etc.). In these applications, a sensible use of the classifier must be based on the reliability of its predictions, and thus, the classifier must be well calibrated. In this paper we show that lazy associative classifiers (LAC) are accurate, and well calibrated using a well known, sound, entropy-minimization method. We explore important applications where such characteristics (i.e., accuracy and calibration) are relevant, and we demonstrate empirically that LAC drastically outperforms other classifiers, such as SVMs, Naive Bayes, and Decision Trees (even after these classifiers are calibrated by specific methods). Additional highlights of LAC include the ability to incorporate reliable predictions for improving training, and the ability to refrain from doubtful predictions.

[1]  Stephen E. Fienberg,et al.  The Comparison and Evaluation of Forecasters. , 1983 .

[2]  金田 重郎,et al.  C4.5: Programs for Machine Learning (書評) , 1995 .

[3]  Moisés Goldszmidt,et al.  Properties and Benefits of Calibrated Classifiers , 2004, PKDD.

[4]  Thorsten Joachims,et al.  Optimizing search engines using clickthrough data , 2002, KDD.

[5]  Bianca Zadrozny,et al.  Transforming classifier scores into accurate multiclass probability estimates , 2002, KDD.

[6]  Filip Radlinski,et al.  A support vector method for optimizing average precision , 2007, SIGIR.

[7]  Hang Li,et al.  AdaRank: a boosting algorithm for information retrieval , 2007, SIGIR.

[8]  B. C. Brookes,et al.  Information Sciences , 2020, Cognitive Skills You Need for the 21st Century.

[9]  J. Rissanen Stochastic Complexity and Modeling , 1986 .

[10]  A. Dawid The Well-Calibrated Bayesian , 1982 .

[11]  James Cussens Bayes and Pseudo-Bayes Estimates of Conditional Probabilities and Their Reliability , 1993, ECML.

[12]  John Platt,et al.  Probabilistic Outputs for Support vector Machines and Comparisons to Regularized Likelihood Methods , 1999 .

[13]  Mohammed J. Zaki,et al.  Lazy Associative Classification , 2006, Sixth International Conference on Data Mining (ICDM'06).

[14]  Usama M. Fayyad,et al.  Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning , 1993, IJCAI.

[15]  Jian Pei,et al.  CMAR: accurate and efficient classification based on multiple class-association rules , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[16]  Rich Caruana,et al.  Predicting good probabilities with supervised learning , 2005, ICML.

[17]  YongXuan Huang,et al.  Parametric calibration of speed-density relationships in mesoscopic traffic simulator with data mining , 2009, Inf. Sci..

[18]  Thorsten Joachims,et al.  Training linear SVMs in linear time , 2006, KDD '06.

[19]  Tao Qin,et al.  LETOR: Benchmark Dataset for Research on Learning to Rank for Information Retrieval , 2007 .

[20]  Wynne Hsu,et al.  Integrating Classification and Association Rule Mining , 1998, KDD.

[21]  Yoram Singer,et al.  An Efficient Boosting Algorithm for Combining Preferences by , 2013 .

[22]  Wagner Meira,et al.  Learning to rank at query-time using association rules , 2008, SIGIR '08.

[23]  Mohammed J. Zaki,et al.  Multi-evidence, multi-criteria, lazy associative document classification , 2006, CIKM '06.

[24]  Anthony K. H. Tung,et al.  FARMER: finding interesting rule groups in microarray datasets , 2004, SIGMOD '04.

[25]  Tie-Yan Liu,et al.  Learning to rank: from pairwise approach to listwise approach , 2007, ICML '07.

[26]  Bojan Cestnik,et al.  Estimating Probabilities: A Crucial Task in Machine Learning , 1990, ECAI.

[27]  Rich Caruana,et al.  Obtaining Calibrated Probabilities from Boosting , 2005, UAI.

[28]  Bianca Zadrozny,et al.  Obtaining calibrated probability estimates from decision trees and naive Bayesian classifiers , 2001, ICML.

[29]  J. Kruskal Nonmetric multidimensional scaling: A numerical method , 1964 .

[30]  Holger R. Maier,et al.  A genetic algorithm calibration method based on convergence due to genetic drift , 2008, Inf. Sci..

[31]  R. Tibshirani,et al.  Additive Logistic Regression : a Statistical View ofBoostingJerome , 1998 .

[32]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .