RIB: A Robust Itemset-based Bayesian approach to classification

Real-life data is often affected by noise. To cope with this issue, classification techniques robust to noisy data are needed. Bayesian approaches are known to be fairly robust to noise. However, to compute probability estimates state-of-the-art Bayesian approaches adopt a lazy pattern-based strategy, which shows some limitations when coping data affected by a notable amount of noise. This paper proposes RIB (Robust Itemset-based Bayesian classifier), a novel eager and pattern-based Bayesian classifier which discovers frequent itemsets from training data and exploits them to build accurate probability estimates. Enforcing a minimum frequency of occurrence on the considered itemsets reduces the sensitivity of the probability estimates to noise. Furthermore, learning a Bayesian Network that also considers high-order dependences among data usually neglected by traditional Bayesian approaches appears to be more robust to noise and data overfitting than selecting a small subset of patterns tailored to each test instance. The experiments demonstrate that RIB is, on average, more accurate than most state-of-the-art classifiers, Bayesian and not, on benchmark datasets in which different kinds and levels of noise are injected. Furthermore, its performance on the same datasets prior to noise injection is competitive with that of state-of-the-art classifiers.

[1]  David Maxwell Chickering,et al.  Learning Bayesian Networks is , 1994 .

[2]  Liangxiao Jiang,et al.  Weightily averaged one-dependence estimators , 2006 .

[3]  Luca Cagliero,et al.  Improving classification models with taxonomy information , 2013, Data Knowl. Eng..

[4]  Luca Cagliero,et al.  EnBay: A Novel Pattern-Based Bayesian Classifier , 2013, IEEE Transactions on Knowledge and Data Engineering.

[5]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[6]  Xingquan Zhu,et al.  Class Noise vs. Attribute Noise: A Quantitative Study , 2003, Artificial Intelligence Review.

[7]  Brian Mac Namee,et al.  Profiling instances in noise reduction , 2012, Knowl. Based Syst..

[8]  Pang-Ning Tan,et al.  Interestingness Measures for Association Patterns : A Perspective , 2000, KDD 2000.

[9]  William W. Cohen Fast Effective Rule Induction , 1995, ICML.

[10]  David W. Aha,et al.  Instance-Based Learning Algorithms , 1991, Machine Learning.

[11]  Dana Angluin,et al.  Learning from noisy examples , 1988, Machine Learning.

[12]  Francisco Herrera,et al.  Analyzing the presence of noise in multi-class problems: alleviating its influence with the One-vs-One decomposition , 2012, Knowledge and Information Systems.

[13]  Lawrence Joseph,et al.  Bayesian statistics for parasitologists. , 2004, Trends in parasitology.

[14]  Aiko M. Hormann,et al.  Programs for Machine Learning. Part I , 1962, Inf. Control..

[15]  Martin Wainwright,et al.  Learning in graphical models: Missing data and rigorous guarantees with non-convexity , 2011 .

[16]  David D. Lewis,et al.  Naive (Bayes) at Forty: The Independence Assumption in Information Retrieval , 1998, ECML.

[17]  Ramakrishnan Srikant,et al.  Mining generalized association rules , 1995, Future Gener. Comput. Syst..

[18]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[19]  Padhraic Smyth,et al.  From Data Mining to Knowledge Discovery in Databases , 1996, AI Mag..

[20]  Michael I. Jordan Learning in Graphical Models , 1999, NATO ASI Series.

[21]  Liangxiao Jiang,et al.  A Novel Bayes Model: Hidden Naive Bayes , 2009, IEEE Transactions on Knowledge and Data Engineering.

[22]  E. Jaynes Information Theory and Statistical Mechanics , 1957 .

[23]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[24]  Luca Cagliero,et al.  Itemset generalization with cardinality-based constraints , 2013, Inf. Sci..

[25]  Elena Baralis,et al.  A Lazy Approach to Associative Classification , 2008, IEEE Transactions on Knowledge and Data Engineering.

[26]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[27]  Ian Witten,et al.  Data Mining , 2000 .

[28]  Liangxiao Jiang,et al.  Improving Tree augmented Naive Bayes for class probability estimation , 2012, Knowl. Based Syst..

[29]  Theodore B. Trafalis,et al.  Support vector machine classification with noisy data: a second order cone programming approach , 2010, Int. J. Gen. Syst..

[30]  Harry Zhang,et al.  Full Bayesian network classifiers , 2006, ICML.

[31]  Kotagiri Ramamohanarao,et al.  A Bayesian Approach to Use Emerging Patterns for Classification , 2003, ADC.

[32]  Anil Gaba Inferences with an unknown noise level in a Bernoulli process , 1993 .

[33]  M. Verleysen,et al.  Classification in the Presence of Label Noise: A Survey , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[34]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD '00.

[35]  Thomas G. Dietterich Approximate Statistical Tests for Comparing Supervised Classification Learning Algorithms , 1998, Neural Computation.

[36]  Ray J. Hickey,et al.  Noise Modelling and Evaluating Learning from Examples , 1996, Artif. Intell..

[37]  Dimitris Meretakis,et al.  Extending naïve Bayes classifiers using long itemsets , 1999, KDD '99.

[38]  Dino Ienco,et al.  LODE: A distance-based classifier built on ensembles of positive and negative observations , 2012, Pattern Recognit..

[39]  Daniel T. Larose,et al.  Discovering Knowledge in Data: An Introduction to Data Mining , 2005 .

[40]  Luca Cagliero,et al.  Generalized association rule mining with constraints , 2012, Inf. Sci..

[41]  Geoffrey I. Webb,et al.  Not So Naive Bayes: Aggregating One-Dependence Estimators , 2005, Machine Learning.

[42]  Geoffrey I. Webb,et al.  Lazy Learning of Bayesian Rules , 2000, Machine Learning.

[43]  Tony R. Martinez,et al.  Improving classification accuracy by identifying and removing instances that should be misclassified , 2011, The 2011 International Joint Conference on Neural Networks.

[44]  Pedro M. Domingos,et al.  On the Optimality of the Simple Bayesian Classifier under Zero-One Loss , 1997, Machine Learning.

[46]  Chen Zhang,et al.  Methods for labeling error detection in microarrays based on the effect of data perturbation on the regression model , 2009, Bioinform..

[47]  Kotagiri Ramamohanarao,et al.  Noise Tolerant Classification by Chi Emerging Patterns , 2004, PAKDD.

[48]  Liangxiao Jiang,et al.  Scaling Up the Accuracy of Bayesian Network Classifiers by M-Estimate , 2007, ICIC.

[49]  Francisco Javier Girón González-Torre,et al.  Misclassified multinomial data: a Bayesian approach , 2007 .

[50]  Loris Nanni,et al.  Reduced Reward-punishment editing for building ensembles of classifiers , 2011, Expert Syst. Appl..

[51]  Ron Kohavi,et al.  Scaling Up the Accuracy of Naive-Bayes Classifiers: A Decision-Tree Hybrid , 1996, KDD.

[52]  Luca Cagliero Discovering Temporal Change Patterns in the Presence of Taxonomies , 2013, IEEE Transactions on Knowledge and Data Engineering.

[53]  Tony R. Martinez,et al.  Using Decision Trees and Soft Labeling to Filter Mislabeled Data , 2008 .