RoughTree A Classifier with Naive-Bayes and Rough Sets Hybrid in Decision Tree Representation

This paper presents a semi-naive classifier named RoughTree, which is designed to alleviate the attribute interdependence problem of Naive Bayesian classifier. RoughTree uses the attribute dependence detecting measure in rough sets and splits the dataset into subspaces according to the selected attributes, which hold the maximum values by the attribute dependence measure. This process continues the same way a decision tree splits until the stopping criterion is satisfied. Then, the result is a tree-like model and each leaf in the RoughTree is replaced by a Naive-Bayesian classifier. RoughTree eliminates the attribute dependences in its leaves and the experimental results show that RoughTree can achieve better performance than Naive Bayesian classifier.

[1]  Wanlei Zhou,et al.  Spam Filtering based on Preference Ranking , 2005, The Fifth International Conference on Computer and Information Technology (CIT'05).

[2]  Geoffrey I. Webb,et al.  Not So Naive Bayes: Aggregating One-Dependence Estimators , 2005, Machine Learning.

[3]  Tony White,et al.  Developing an Immunity to Spam , 2003, GECCO.

[4]  H. Tahayori,et al.  Augmented Interval Type-2 Fuzzy Set Methodologies for Email Granulation , 2007, 2007 2nd International Workshop on Soft Computing Applications.

[5]  Wojciech Ziarko,et al.  Discovering attribute relationships, dependencies and rules by using rough sets , 1995, Proceedings of the Twenty-Eighth Annual Hawaii International Conference on System Sciences.

[6]  M. Pazzani Constructive Induction of Cartesian Product Attributes , 1998 .

[7]  Jerry M. Mendel,et al.  Type-2 fuzzy sets made simple , 2002, IEEE Trans. Fuzzy Syst..

[8]  Geoffrey I. Webb,et al.  Lazy Learning of Bayesian Rules , 2000, Machine Learning.

[9]  Masaharu Mizumoto,et al.  Some Properties of Fuzzy Sets of Type 2 , 1976, Inf. Control..

[10]  Jerry M. Mendel,et al.  Fuzzy sets for words: a new beginning , 2003, The 12th IEEE International Conference on Fuzzy Systems, 2003. FUZZ '03..

[11]  Rodrigo Roman,et al.  An anti-spam scheme using pre-challenges , 2006, Comput. Commun..

[12]  Jerry M. Mendel,et al.  Footprint of uncertainty and its importance to type-2 fuzzy sets , 2002 .

[13]  Ron Kohavi,et al.  Lazy Decision Trees , 1996, AAAI/IAAI, Vol. 1.

[14]  W. Pedrycz,et al.  Distributed Intervals: A Formal Framework for Information Granulation , 2007, 2007 Canadian Conference on Electrical and Computer Engineering.

[15]  Jerzy W. Grzymala-Busse,et al.  Rough Sets , 1995, Commun. ACM.

[16]  Levent Özgür,et al.  Adaptive anti-spam filtering for agglutinative languages: a special case for Turkish , 2004, Pattern Recognit. Lett..

[17]  D. Haussler,et al.  Boolean Feature Discovery in Empirical Learning , 1990, Machine Learning.

[18]  Jerry M. Mendel,et al.  Interval Type-2 Fuzzy Logic Systems Made Simple , 2006, IEEE Transactions on Fuzzy Systems.

[19]  Ron Kohavi,et al.  Scaling Up the Accuracy of Naive-Bayes Classifiers: A Decision-Tree Hybrid , 1996, KDD.

[20]  Ido Dagan,et al.  Mistake-Driven Learning in Text Categorization , 1997, EMNLP.

[21]  King-Sun Fu,et al.  Handbook of pattern recognition and image processing , 1986 .

[22]  Lotfi A. Zadeh,et al.  The concept of a linguistic variable and its application to approximate reasoning-III , 1975, Inf. Sci..

[23]  Tony White,et al.  Immunity from Spam: An Analysis of an Artificial Immune System for Junk Email Detection , 2005, ICARIS.

[24]  Zili Zhang,et al.  An email classification model based on rough set theory , 2005, Proceedings of the 2005 International Conference on Active Media Technology, 2005. (AMT 2005)..

[25]  S. C. Hui,et al.  Neural Networks for Web Content Filtering , 2002, IEEE Intell. Syst..

[26]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[27]  Pat Langley,et al.  Induction of Selective Bayesian Classifiers , 1994, UAI.

[28]  Tony White,et al.  Increasing the accuracy of a spam-detecting artificial immune system , 2003, The 2003 Congress on Evolutionary Computation, 2003. CEC '03..

[29]  Minoru Sasaki,et al.  Spam detection using text clustering , 2005, 2005 International Conference on Cyberworlds (CW'05).

[30]  Nir Friedman,et al.  Bayesian Network Classifiers , 1997, Machine Learning.

[31]  Kwang-Ting Cheng,et al.  Using visual features for anti-spam filtering , 2005, IEEE International Conference on Image Processing 2005.

[32]  Alex Alves Freitas,et al.  AISEC: an artificial immune system for e-mail classification , 2003, IEEE Congress on Evolutionary Computation.

[33]  Jerry M. Mendel,et al.  Centroid uncertainty bounds for interval type-2 fuzzy sets: forward and inverse problems , 2004, 2004 IEEE International Conference on Fuzzy Systems (IEEE Cat. No.04CH37542).

[34]  Jerry M. Mendel,et al.  Computing with words and its relationships with fuzzistics , 2007, Inf. Sci..

[35]  Pedro M. Domingos,et al.  Beyond Independence: Conditions for the Optimality of the Simple Bayesian Classifier , 1996, ICML.

[36]  Ajith Abraham,et al.  Artificial immune system inspired behavior-based anti-spam filter , 2007, Soft Comput..

[37]  J. Mendel Uncertain Rule-Based Fuzzy Logic Systems: Introduction and New Directions , 2001 .

[38]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .