Noise Tolerant Classification by Chi Emerging Patterns

Classification is an important data mining problem. A desirable property of a classifier is noise tolerance. Emerging Patterns (EPs) are itemsets whose supports change significantly from one data class to another. In this paper, we first introduce Chi Emerging Patterns (Chi EPs), which are more resistant to noise than other kinds of EPs. We then use Chi EPs in a probabilistic approach for classification. The classifier, Bayesian Classification by Chi Emerging Patterns (BCCEP), can handle noise very well due to the inherent noise tolerance of the Bayesian approach and high quality patterns used in the probability approximation. The empirical study shows that our method is superior to other well-known classification methods such as NB, C4.5, SVM and JEP-C in terms of overall predictive accuracy, on “noisy” as well as “clean” benchmark datasets from the UCI Machine Learning Repository. Out of the 116 cases, BCCEP wins on 70 cases, NB wins on 30, C4.5 wins on 33, SVM wins on 32 and JEP-C wins on 21.

[1]  Kotagiri Ramamohanarao,et al.  A Bayesian Approach to Use Emerging Patterns for Classification , 2003, ADC.

[2]  Pedro M. Domingos,et al.  Beyond Independence: Conditions for the Optimality of the Simple Bayesian Classifier , 1996, ICML.

[3]  Kotagiri Ramamohanarao,et al.  Noise Tolerance of EP-Based Classifiers , 2003, Australian Conference on Artificial Intelligence.

[4]  Ron Kohavi,et al.  Data Mining Using MLC a Machine Learning Library in C++ , 1996, Int. J. Artif. Intell. Tools.

[5]  Meng Li,et al.  Stream Operators for Querying Data Streams , 2005, WAIM.

[6]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[7]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques with Java implementations , 2002, SGMD.

[8]  Kotagiri Ramamohanarao,et al.  Making Use of the Most Expressive Jumping Emerging Patterns for Classification , 2001, Knowledge and Information Systems.

[9]  Jinyan Li,et al.  CAEP: Classification by Aggregating Emerging Patterns , 1999, Discovery Science.

[10]  Kotagiri Ramamohanarao,et al.  Efficiently Mining Interesting Emerging Patterns , 2003, WAIM.

[11]  Jinyan Li,et al.  Efficient mining of emerging patterns: discovering trends and differences , 1999, KDD '99.

[12]  J. Hoffman Numerical Methods for Engineers and Scientists , 2018 .

[13]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[14]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[15]  Benjamin S. Duran,et al.  Statistical Methods for Engineers and Scientists , 1985 .