ACME: An Associative Classifier Based on Maximum Entropy Principle

Recent studies in classification have proposed ways of exploiting the association rule mining paradigm. These studies have performed extensive experiments to show their techniques to be both efficient and accurate. However, existing studies in this paradigm either do not provide any theoretical justification behind their approaches or assume independence between some parameters. In this work, we propose a new classifier based on association rule mining. Our classifier rests on the maximum entropy principle for its statistical basis and does not assume any independence not inferred from the given dataset. We use the classical generalized iterative scaling algorithm (GIS) to create our classification model. We show that GIS fails in some cases when itemsets are used as features and provide modifications to rectify this problem. We show that this modified GIS runs much faster than the original GIS. We also describe techniques to make GIS tractable for large feature spaces – we provide a new technique to divide a feature space into independent clusters each of which can be handled separately. Our experimental results show that our classifier is generally more accurate than the existing classification methods.

[1]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .

[2]  Wei-Yin Loh,et al.  A Comparison of Prediction Accuracy, Complexity, and Training Time of Thirty-Three Old and New Classification Algorithms , 2000, Machine Learning.

[3]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[4]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[5]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[6]  Dimitris Meretakis,et al.  Extending naïve Bayes classifiers using long itemsets , 1999, KDD '99.

[7]  Nir Friedman,et al.  Bayesian Network Classifiers , 1997, Machine Learning.

[8]  Craig Boutilier,et al.  Context-Specific Independence in Bayesian Networks , 1996, UAI.

[9]  Jinyan Li,et al.  CAEP: Classification by Aggregating Emerging Patterns , 1999, Discovery Science.

[10]  Igor Kononenko,et al.  Semi-Naive Bayesian Classifier , 1991, EWSL.

[11]  J. Darroch,et al.  Generalized Iterative Scaling for Log-Linear Models , 1972 .

[12]  Wynne Hsu,et al.  Integrating Classification and Association Rule Mining , 1998, KDD.

[13]  Raymond Lau,et al.  Adaptive statistical language modeling , 1994 .

[14]  Ian Witten,et al.  Data Mining , 2000 .

[15]  Dustin Boswell,et al.  Introduction to Support Vector Machines , 2002 .

[16]  Usama M. Fayyad,et al.  Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning , 1993, IJCAI.

[17]  Pat Langley,et al.  Induction of Selective Bayesian Classifiers , 1994, UAI.

[18]  Jian Pei,et al.  CMAR: accurate and efficient classification based on multiple class-association rules , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[19]  Adwait Ratnaparkhi,et al.  A Simple Introduction to Maximum Entropy Models for Natural Language Processing , 1997 .

[20]  I. Good Maximum Entropy for Hypothesis Formulation, Especially for Multidimensional Contingency Tables , 1963 .

[21]  Peter Clark,et al.  The CN2 Induction Algorithm , 1989, Machine Learning.

[22]  John D. Lafferty,et al.  Statistical Models for Text Segmentation , 1999, Machine Learning.

[23]  Mitchell P. Marcus,et al.  Maximum entropy models for natural language ambiguity resolution , 1998 .

[24]  Ronald Rosenfeld,et al.  Adaptive Statistical Language Modeling; A Maximum Entropy Approach , 1994 .

[25]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[26]  Hongjun Lu,et al.  A Study on the Performance of Large Bayes Classifier , 2000, ECML.