Application-Independent Feature Construction from Noisy Samples

When training classifiers, presence of noise can severely harm their performance. In this paper, we focus on "non-class" attribute noise and we consider how a frequent fault-tolerant (FFT) pattern mining task can be used to support noise-tolerant classification. Our method is based on an application independent strategy for feature construction based on the so-called *** -free patterns. Our experiments on noisy training data shows accuracy improvement when using the computed features instead of the original ones.

[1]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[2]  Wynne Hsu,et al.  Integrating Classification and Association Rule Mining , 1998, KDD.

[3]  Bruno Crémilleux,et al.  Optimized Rule Mining Through a Unified Framework for Interestingness Measures , 2006, DaWaK.

[4]  Aiko M. Hormann,et al.  Programs for Machine Learning. Part I , 1962, Inf. Control..

[5]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[6]  Xindong Wu,et al.  Computing the minimum-support for mining frequent patterns , 2008, Knowledge and Information Systems.

[7]  Carla E. Brodley,et al.  Class Noise Mitigation Through Instance Weighting , 2007, ECML.

[8]  Xindong Wu,et al.  Noise Modeling with Associative Corruption Rules , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[9]  Joost N. Kok Machine Learning: ECML 2007, 18th European Conference on Machine Learning, Warsaw, Poland, September 17-21, 2007, Proceedings , 2007, ECML.

[10]  Kotagiri Ramamohanarao,et al.  Noise Tolerant Classification by Chi Emerging Patterns , 2004, PAKDD.

[11]  Jan Komorowski,et al.  Principles of Data Mining and Knowledge Discovery , 2001, Lecture Notes in Computer Science.

[12]  Ian Witten,et al.  Data Mining , 2000 .

[13]  Francesco Bonchi,et al.  Knowledge Discovery in Inductive Databases, 4th International Workshop, KDID 2005, Porto, Portugal, October 3, 2005, Revised Selected and Invited Papers , 2006, KDID.

[14]  Gerd Stumme,et al.  Mining frequent patterns with counting inference , 2000, SKDD.

[15]  Jean-François Boulicaut,et al.  Approximation of Frequency Queris by Means of Free-Sets , 2000, PKDD.

[16]  Ruggero G. Pensa,et al.  Constraint-Based Mining of Fault-Tolerant Patterns from Boolean Data , 2005, KDID.

[17]  Jean-François Boulicaut,et al.  Simplest Rules Characterizing Classes Generated by δ-Free Sets , 2003 .

[18]  Xingquan Zhu,et al.  Class Noise vs. Attribute Noise: A Quantitative Study , 2003, Artificial Intelligence Review.

[19]  Kotagiri Ramamohanarao,et al.  Patterns Based Classifiers , 2007, World Wide Web.

[20]  Jean-François Boulicaut,et al.  Feature Construction and δ-Free Sets in 0 / 1 Samples , 2006 .

[21]  Dino Pedreschi,et al.  Knowledge Discovery in Databases: PKDD 2004 , 2004, Lecture Notes in Computer Science.

[22]  Jean-François Boulicaut,et al.  Feature Construction Based on Closedness Properties Is Not That Simple , 2008, PAKDD.

[23]  Keki B. Irani,et al.  Multi-interval discretization of continuos attributes as pre-processing for classi cation learning , 1993, IJCAI 1993.

[24]  Cheng Yang,et al.  Efficient discovery of error-tolerant frequent itemsets in high dimensions , 2001, KDD '01.

[25]  Xindong Wu,et al.  Dealing with Predictive-but-Unpredictable Attributes in Noisy Data Sources , 2004, PKDD.

[26]  Ruggero G. Pensa,et al.  Supporting bi-cluster interpretation in 0/1 data by means of local patterns , 2006, Intell. Data Anal..

[27]  Loïc Cerf,et al.  A Parameter-Free Associative Classification Method , 2008, DaWaK.

[28]  Jinyan Li,et al.  Efficient mining of emerging patterns: discovering trends and differences , 1999, KDD '99.

[29]  Jiawei Han,et al.  Discriminative Frequent Pattern Analysis for Effective Classification , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[30]  Andrew W. Moore,et al.  Probabilistic noise identification and data cleaning , 2003, Third IEEE International Conference on Data Mining.