Construction de descripteurs pour classer à partir d'exemples bruités

Dans les donnees reelles, il est courant que les attributs qui decrivent les objets soient bruites. En classification supervisee, la presence de bruits peut avoir des effets desastreux sur la performance des classifieurs (notamment en terme de precision) comme sur la pertinence des decisions prises par les modeles predictifs. On peut identifier deux types de bruits : d'une part, le bruit portant sur l'attribut classe qui a ete beaucoup etudie ces dernieres annees ; d'autre part, le bruit affectant les autres attributs pour lequel quelques approches ont ete recemment developpees. Notre contribution apporte une solution concernant le probleme des attributs non-classe bruites en classification supervisee. Pour cela, nous proposons de construire de nouveaux descripteurs robustes bases sur une representation condensee approximative des itemsets frequents. Les resultats experimentaux montrent que des classifieurs (C4.5 et Naive Bayes) sont plus performants en terme de precision sur des donnees bruitees munies de ces nouveaux descripteurs que sur les donnees originales.

[1]  Jean-François Boulicaut,et al.  Approximation of Frequency Queris by Means of Free-Sets , 2000, PKDD.

[2]  Ruggero G. Pensa,et al.  Constraint-Based Mining of Fault-Tolerant Patterns from Boolean Data , 2005, KDID.

[3]  Kotagiri Ramamohanarao,et al.  Noise Tolerance of EP-Based Classifiers , 2003, Australian Conference on Artificial Intelligence.

[4]  Kotagiri Ramamohanarao,et al.  Noise Tolerant Classification by Chi Emerging Patterns , 2004, PAKDD.

[5]  Jiawei Han,et al.  Discriminative Frequent Pattern Analysis for Effective Classification , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[6]  Kotagiri Ramamohanarao,et al.  Patterns Based Classifiers , 2007, World Wide Web.

[7]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[8]  Bruno Crémilleux,et al.  Optimized Rule Mining Through a Unified Framework for Interestingness Measures , 2006, DaWaK.

[9]  Jean-François Boulicaut,et al.  Simplest Rules Characterizing Classes Generated by δ-Free Sets , 2003 .

[10]  Xingquan Zhu,et al.  Class Noise vs. Attribute Noise: A Quantitative Study , 2003, Artificial Intelligence Review.

[11]  Jinyan Li,et al.  Efficient mining of emerging patterns: discovering trends and differences , 1999, KDD '99.

[12]  Xindong Wu,et al.  Dealing with Predictive-but-Unpredictable Attributes in Noisy Data Sources , 2004, PKDD.

[13]  Ruggero G. Pensa,et al.  Supporting bi-cluster interpretation in 0/1 data by means of local patterns , 2006, Intell. Data Anal..

[14]  Xindong Wu,et al.  Computing the minimum-support for mining frequent patterns , 2008, Knowledge and Information Systems.

[15]  Jean-François Boulicaut,et al.  Feature Construction and delta-Free Sets in 0/1 Samples , 2006, Discovery Science.

[16]  References , 1971 .

[17]  Andrew W. Moore,et al.  Probabilistic noise identification and data cleaning , 2003, Third IEEE International Conference on Data Mining.

[18]  Carla E. Brodley,et al.  Class Noise Mitigation Through Instance Weighting , 2007, ECML.

[19]  Wynne Hsu,et al.  Integrating Classification and Association Rule Mining , 1998, KDD.

[20]  Usama M. Fayyad,et al.  Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning , 1993, IJCAI.

[21]  Xindong Wu,et al.  Noise Modeling with Associative Corruption Rules , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[22]  Cheng Yang,et al.  Efficient discovery of error-tolerant frequent itemsets in high dimensions , 2001, KDD '01.

[23]  Nada Lavrac,et al.  Closed Sets for Labeled Data , 2006, PKDD.

[24]  Jean-François Boulicaut,et al.  Feature Construction Based on Closedness Properties Is Not That Simple , 2008, PAKDD.

[25]  Gerd Stumme,et al.  Mining frequent patterns with counting inference , 2000, SKDD.

[26]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .