The combination of multiple classifiers using an evidential reasoning approach

In many domains when we have several competing classifiers available we want to synthesize them or some of them to get a more accurate classifier by a combination function. In this paper we propose a 'class-indifferent' method for combining classifier decisions represented by evidential structures called triplet and quartet, using Dempster's rule of combination. This method is unique in that it distinguishes important elements from the trivial ones in representing classifier decisions, makes use of more information than others in calculating the support for class labels and provides a practical way to apply the theoretically appealing Dempster-Shafer theory of evidence to the problem of ensemble learning. We present a formalism for modelling classifier decisions as triplet mass functions and we establish a range of formulae for combining these mass functions in order to arrive at a consensus decision. In addition we carry out a comparative study with the alternatives of simplet and dichotomous structure and also compare two combination methods, Dempster's rule and majority voting, over the UCI benchmark data, to demonstrate the advantage our approach offers.

[1]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[2]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[3]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[4]  Robert P. W. Duin,et al.  Experiments with Classifier Combining Rules , 2000, Multiple Classifier Systems.

[5]  Weiru Liu,et al.  Reinvestigating Dempster's Idea on Evidence Combination , 2000, Knowledge and Information Systems.

[6]  Anil K. Jain,et al.  Statistical Pattern Recognition: A Review , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[7]  Yaxin Bi An efficient triplet‐based algorithm for evidential reasoning , 2008, Int. J. Intell. Syst..

[8]  Ludmila I. Kuncheva,et al.  Measures of Diversity in Classifier Ensembles and Their Relationship with the Ensemble Accuracy , 2003, Machine Learning.

[9]  Yaxin Bi,et al.  Combining Evidence from Classifiers in Text Categorization , 2004, KES.

[10]  Ian H. Witten,et al.  Issues in Stacked Generalization , 2011, J. Artif. Intell. Res..

[11]  L. Kuncheva,et al.  Combining classifiers: Soft computing solutions. , 2001 .

[12]  Arthur P. Dempster,et al.  Upper and Lower Probabilities Induced by a Multivalued Mapping , 1967, Classic Works of the Dempster-Shafer Theory of Belief Functions.

[13]  D. Bell,et al.  Evidence Theory and Its Applications , 1991 .

[14]  J. Fleiss,et al.  The Reliability of Dichotomous Judgments: Unequal Numbers of Judges per Subject , 1979 .

[15]  Yaxin Bi,et al.  On Combining Multiple Classifiers Using an Evidential Approach , 2006, AAAI.

[16]  Philippe Smets,et al.  The Combination of Evidence in the Transferable Belief Model , 1990, IEEE Trans. Pattern Anal. Mach. Intell..

[17]  Bernard Zenko,et al.  Is Combining Classifiers with Stacking Better than Selecting the Best One? , 2004, Machine Learning.

[18]  Fabrizio Sebastiani,et al.  Machine learning in automated text categorization , 2001, CSUR.

[19]  Thomas G. Dietterich An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting, and Randomization , 2000, Machine Learning.

[20]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[21]  Adam Krzyżak,et al.  Methods of combining multiple classifiers and their applications to handwriting recognition , 1992, IEEE Trans. Syst. Man Cybern..

[22]  Ching Y. Suen,et al.  Application of majority voting to pattern recognition: an analysis of its behavior and performance , 1997, IEEE Trans. Syst. Man Cybern. Part A.

[23]  Yiming Yang,et al.  Combining Multiple Learning Strategies for Effective Cross Validation , 2000, ICML.

[24]  Tin Kam Ho,et al.  The Random Subspace Method for Constructing Decision Forests , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[25]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[26]  David H. Wolpert,et al.  Stacked generalization , 1992, Neural Networks.

[27]  Raymond J. Mooney,et al.  Constructing Diverse Classifier Ensembles using Artificial Training Examples , 2003, IJCAI.

[28]  Geoffrey I. Webb,et al.  MultiBoosting: A Technique for Combining Boosting and Wagging , 2000, Machine Learning.

[29]  Rolf Haenni,et al.  Are alternatives to Dempster's rule of combination real alternatives?: Comments on "About the belief function combination and the conflict management problem" - Lefevre et al , 2002, Inf. Fusion.

[30]  T. Denœux Conjunctive and disjunctive combination of belief functions induced by nondistinct bodies of evidence , 2008 .

[31]  Ian Witten,et al.  Data Mining , 2000 .

[32]  Maria Petrou,et al.  Use of Dempster-Shafer theory to combine classifiers which use different class boundaries , 2003, Pattern Analysis & Applications.

[33]  Thierry Denoeux,et al.  An evidence-theoretic k-NN rule with parameter optimization , 1998, IEEE Trans. Syst. Man Cybern. Part C.

[34]  Thomas G. Dietterich Multiple Classifier Systems , 2000, Lecture Notes in Computer Science.

[35]  Mohamed A. Deriche,et al.  A New Technique for Combining Multiple Classifiers using The Dempster-Shafer Theory of Evidence , 2002, J. Artif. Intell. Res..

[36]  Thierry Denoeux,et al.  Pairwise classifier combination using belief functions , 2007, Pattern Recognit. Lett..

[37]  Frans Voorbraak,et al.  On the Justification of Dempster's Rule of Combination , 1988, Artif. Intell..

[38]  Weiru Liu,et al.  Analyzing the degree of conflict among belief functions , 2006, Artif. Intell..

[39]  Thierry Denoeux,et al.  A neural network classifier based on Dempster-Shafer theory , 2000, IEEE Trans. Syst. Man Cybern. Part A.

[40]  David W. Opitz,et al.  Feature Selection for Ensembles , 1999, AAAI/IAAI.

[41]  Josef Kittler,et al.  Combining multiple classifiers by averaging or by multiplying? , 2000, Pattern Recognit..

[42]  Yaxin Bi,et al.  On combining classifier mass functions for text categorization , 2005, IEEE Transactions on Knowledge and Data Engineering.

[43]  Glenn Shafer,et al.  Implementing Dempster's Rule for Hierarchical Evidence , 1987, Artif. Intell..

[44]  E. Mandler,et al.  Combining the Classification Results of Independent Classifiers Based on the Dempster/Shafer Theory of Evidence , 1988 .

[45]  G DietterichThomas An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees , 2000 .

[46]  Lars Kai Hansen,et al.  Neural Network Ensembles , 1990, IEEE Trans. Pattern Anal. Mach. Intell..

[47]  Glenn Shafer,et al.  A Mathematical Theory of Evidence , 2020, A Mathematical Theory of Evidence.

[48]  Thomas G. Dietterich Machine-Learning Research , 1997, AI Mag..

[49]  Jiri Matas,et al.  On Combining Classifiers , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[50]  Galina L. Rogova,et al.  Combining the results of several neural network classifiers , 1994, Neural Networks.

[51]  Thierry Denoeux,et al.  Approximating the combination of belief functions using the fast Mo"bius transform in a coarsened frame , 2002, Int. J. Approx. Reason..

[52]  Jeffrey A. Barnett,et al.  Computational Methods for a Mathematical Theory of Evidence , 1981, IJCAI.

[53]  Thierry Denoeux,et al.  A k-nearest neighbor classification rule based on Dempster-Shafer theory , 1995, IEEE Trans. Syst. Man Cybern..

[54]  W. Bruce Croft,et al.  Combining classifiers in text categorization , 1996, SIGIR '96.