Knowledge-based system for text classification using ID6NB algorithm

This paper presents a novel algorithm named ID6NB for extending decision tree induced by Quinlan's non-incremental ID3 algorithm. The presented approach is aimed at suggesting the solutions for few unhandled exceptions of the Decision tree induction algorithms such as (i) the situation in which the majority voting makes incorrect decision (generating two different types of rules for same data), and (ii) in case of dimensionality reduction by decision tree induction algorithms, the determination of appropriate attribute at a node where two or more attributes have equal highest information gain. Exception due to majority voting is handled with the help of Naive Bayes algorithm and also novel solutions are given for dimensionality reduction. As a result, the classification accuracy has drastically improved. An extensive experimental evaluation on a number of real and synthetic databases shows that ID6NB is a state-of-the-art classification algorithm that outperforms well than other methods of decision tree learning.

[1]  Douglas H. Fisher,et al.  A Case Study of Incremental Concept Induction , 1986, AAAI.

[2]  Sebastian Thrun,et al.  The MONK''s Problems-A Performance Comparison of Different Learning Algorithms, CMU-CS-91-197, Sch , 1991 .

[3]  Paul E. Utgoff,et al.  Decision Tree Induction Based on Efficient Tree Restructuring , 1997, Machine Learning.

[4]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[5]  Hiroshi Motoda,et al.  Feature Selection for Knowledge Discovery and Data Mining , 1998, The Springer International Series in Engineering and Computer Science.

[6]  Paul E. Utgoff,et al.  ID5: An Incremental ID3 , 1987, ML.

[7]  Richard Granger,et al.  Incremental Learning from Noisy Data , 1986, Machine Learning.

[8]  Paul E. Utgoff,et al.  Incremental Induction of Decision Trees , 1989, Machine Learning.

[9]  Richard Granger,et al.  Beyond Incremental Processing: Tracking Concept Drift , 1986, AAAI.

[10]  Paul E. Utgoff,et al.  An Improved Algorithm for Incremental Induction of Decision Trees , 1994, ICML.

[11]  J. Ross Quinlan,et al.  Simplifying Decision Trees , 1987, Int. J. Man Mach. Stud..

[12]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[13]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[14]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques with Java implementations , 2002, SGMD.

[15]  Padhraic Smyth,et al.  From Data Mining to Knowledge Discovery: An Overview , 1996, Advances in Knowledge Discovery and Data Mining.

[16]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[17]  Paul E. Utgoff,et al.  Improved Training Via Incremental Learning , 1989, ML.

[18]  Ron Kohavi,et al.  Irrelevant Features and the Subset Selection Problem , 1994, ICML.