Trainable fusion rules. I. Large sample size case

A wide selection of standard statistical pattern classification algorithms can be applied as trainable fusion rules while designing neural network ensembles. A focus of the present two-part paper is finite sample effects: the complexity of base classifiers and fusion rules; the type of outputs provided by experts to the fusion rule; non-linearity of the fusion rule; degradation of experts and the fusion rule due to the lack of information in the design set; the adaptation of base classifiers to training set size, etc. In the first part of this paper, we consider arguments for utilizing continuous outputs of base classifiers versus categorical outputs and conclude: if one succeeds in having a small number of expert networks working perfectly in different parts of the input feature space, then crisp outputs may be preferable over continuous outputs. Afterwards, we oppose fixed fusion rules versus trainable ones and demonstrate situations where weighted average fusion can outperform simple average fusion. We present a review of statistical classification rules, paying special attention to these linear and non-linear rules, which are employed rarely but, according to our opinion, could be useful in neural network ensembles. We consider ideal and sample-based oracle decision rules and illustrate characteristic features of diverse fusion rules by considering an artificial two-dimensional (2D) example where the base classifiers perform well in different regions of input feature space.

[1]  Keinosuke Fukunaga,et al.  Introduction to Statistical Pattern Recognition , 1972 .

[2]  Adam Krzyżak,et al.  Methods of combining multiple classifiers and their applications to handwriting recognition , 1992, IEEE Trans. Syst. Man Cybern..

[3]  Anil K. Jain,et al.  Small Sample Size Effects in Statistical Pattern Recognition: Recommendations for Practitioners , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[4]  Fabio Roli,et al.  Selection of Classifiers Based on Multiple Classifier Behaviour , 2000, SSPR/SPR.

[5]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[6]  Sarunas Raudys,et al.  Reduction of the Boasting Bias of Linear Experts , 2002, Multiple Classifier Systems.

[7]  Sarunas Raudys,et al.  Experts' Boasting in Trainable Fusion Rules , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[8]  Joydeep Ghosh,et al.  Multiclassifier Systems: Back to the Future , 2002, Multiple Classifier Systems.

[9]  R. Clemen Combining forecasts: A review and annotated bibliography , 1989 .

[10]  Amar Mitiche,et al.  Classifier combination for hand-printed digit recognition , 1993, Proceedings of 2nd International Conference on Document Analysis and Recognition (ICDAR '93).

[11]  Sarunas Raudys Trainable fusion rules. II. Small sample-size effects , 2006, Neural Networks.

[12]  Fuad Rahman,et al.  Multiple classifier decision combination strategies for character recognition: A review , 2003, Document Analysis and Recognition.

[13]  Gian Luca Marcialis,et al.  An Experimental Comparison of Fixed and Trained Fusion Rules for Crisp Classifier Outputs , 2002, Multiple Classifier Systems.

[14]  Sarunas Raudys Multiple Classification Systems in the Context of Feature Extraction and Selection , 2002, Multiple Classifier Systems.

[15]  Sherif Hashem,et al.  Optimal Linear Combinations of Neural Networks , 1997, Neural Networks.

[16]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[17]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[18]  Nikunj C. Oza Multiple Classifier Systems, 6th International Workshop, MCS 2005, Seaside, CA, USA, June 13-15, 2005, Proceedings , 2005, Multiple Classifier Systems.

[19]  Jiri Matas,et al.  On Combining Classifiers , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[20]  H. Linhart Techniques for discriminant analysis with discrete variables , 1959 .

[21]  L. A. Tel'ksnis Two-stage optimal recognition systems , 1968 .

[22]  Richard J. Mammone,et al.  Speaker recognition - general classifier approaches and data fusion methods , 2002, Pattern Recognit..

[23]  Peter E. Hart,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[24]  Sariinas Ra Ud Ys ON THE EFFECTIVENESS OF PARZEN WINDOW CLASSIFIER , 1991 .

[25]  Marcos Dipinto,et al.  Discriminant analysis , 2020, Predictive Analytics.

[26]  Simon Haykin,et al.  Neural Networks: A Comprehensive Foundation , 1998 .

[27]  Naonori Ueda,et al.  Optimal Linear Combination of Neural Networks for Improving Classification Performance , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[28]  Sarunas Raudys,et al.  Evolution and generalization of a single neurone: I. Single-layer perceptron as seven statistical classifiers , 1998, Neural Networks.

[29]  W. G. Cochran,et al.  Some Classification Problems with Multivariate Qualitative Data , 1961 .

[30]  Thomas G. Dietterich Multiple Classifier Systems , 2000, Lecture Notes in Computer Science.

[31]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[32]  Josef Kittler,et al.  Combining classifiers: A theoretical framework , 1998, Pattern Analysis and Applications.

[33]  Fabio Roli,et al.  Dynamic classifier selection based on multiple classifier behaviour , 2001, Pattern Recognit..

[34]  Kevin W. Bowyer,et al.  Combination of Multiple Classifiers Using Local Accuracy Estimates , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[35]  Subhash C. Bagui,et al.  Combining Pattern Classifiers: Methods and Algorithms , 2005, Technometrics.

[36]  UedaNaonori Optimal Linear Combination of Neural Networks for Improving Classification Performance , 2000 .

[37]  Xin Yao,et al.  A constructive algorithm for training cooperative neural network ensembles , 2003, IEEE Trans. Neural Networks.

[38]  James C. Bezdek,et al.  Decision templates for multiple classifier fusion: an experimental comparison , 2001, Pattern Recognit..

[39]  Hee-Joong Kang Combining multiple classifiers based on third-order dependency for handwritten numeral recognition*1 , 2003, Pattern Recognit. Lett..

[40]  Fabio Roli,et al.  The Behavior Knowledge Space Fusion Method: Analysis of Generalization Error and Strategies for Performance Improvement , 2003, Multiple Classifier Systems.

[41]  Tin Kam Ho,et al.  Data Complexity Analysis for Classifier Combination , 2001, Multiple Classifier Systems.

[42]  Ching Y. Suen,et al.  A Method of Combining Multiple Experts for the Recognition of Unconstrained Handwritten Numerals , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[43]  Fabio Roli,et al.  A theoretical and experimental analysis of linear combiners for multiple classifier systems , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[44]  Josef Kittler,et al.  Multiple Classifier Systems , 2004, Lecture Notes in Computer Science.

[45]  Keinosuke Fukunaga,et al.  Introduction to statistical pattern recognition (2nd ed.) , 1990 .

[46]  Ludmila I. Kuncheva,et al.  Combining Pattern Classifiers: Methods and Algorithms , 2004 .

[47]  J. Friedman Regularized Discriminant Analysis , 1989 .