Learning Locally Weighted C4.4 for Class Probability Estimation

In many real-world data mining applications, accurate class probability estimations are often required to make optimal decisions. For example, in direct marketing, we often need to deploy different promotion strategies to customers with different likelihood (probability) of buying some products. When our learning task is to build a model with accurate class probability estimations, C4.4 is the most popular one for achieving this task because of its efficiency and effect. In this paper, we present a locally weighted version of C4.4 to scale up its class probability estimation performance by combining locally weighted learning with C4.4. We call our improved algorithm locally weighted C4.4, simply LWC4.4. We experimentally tested LWC4.4 using the whole 36 UCI data sets selected by Weka, and compared it to other related algorithms: C4.4, NB, KNN, NBTree, and LWNB. The experimental results show that LWC4.4 significantly outperforms all the other algorithms in term of conditional log likelihood, simply CLL. Thus, our work provides an effective algorithm to produce accurate class probability estimation.

[1]  Bernhard Pfahringer,et al.  Locally Weighted Naive Bayes , 2002, UAI.

[2]  David J. Hand,et al.  A Simple Generalisation of the Area Under the ROC Curve for Multiple Class Classification Problems , 2001, Machine Learning.

[3]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[4]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[5]  Liangxiao Jiang,et al.  Using Locally Weighted Learning to Improve SMOreg for Regression , 2006, PRICAI.

[6]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .

[7]  Andrew W. Moore,et al.  Locally Weighted Learning , 1997, Artificial Intelligence Review.

[8]  Ron Kohavi,et al.  Lazy Decision Trees , 1996, AAAI/IAAI, Vol. 1.

[9]  Ron Kohavi,et al.  Scaling Up the Accuracy of Naive-Bayes Classifiers: A Decision-Tree Hybrid , 1996, KDD.

[10]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[11]  Nir Friedman,et al.  Bayesian Network Classifiers , 1997, Machine Learning.

[12]  Andrew McCallum,et al.  Dynamic conditional random fields: factorized probabilistic models for labeling and segmenting sequence data , 2004, J. Mach. Learn. Res..

[13]  Ian Witten,et al.  Data Mining , 2000 .

[14]  Russell Greiner,et al.  Discriminative Model Selection for Belief Net Structures , 2005, AAAI.

[15]  David Maxwell Chickering,et al.  Learning Bayesian Networks is , 1994 .

[16]  Pedro M. Domingos,et al.  Tree Induction for Probability-Based Ranking , 2003, Machine Learning.

[17]  Doug Fisher,et al.  Learning from Data: Artificial Intelligence and Statistics V , 1996 .

[18]  Christopher J. Merz,et al.  UCI Repository of Machine Learning Databases , 1996 .

[19]  Pedro M. Domingos,et al.  Learning Bayesian network classifiers by maximizing conditional likelihood , 2004, ICML.

[20]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[21]  Andrew P. Bradley,et al.  The use of the area under the ROC curve in the evaluation of machine learning algorithms , 1997, Pattern Recognit..

[22]  Ron Kohavi,et al.  The Case against Accuracy Estimation for Comparing Induction Algorithms , 1998, ICML.