Use of Bayesian data reduction for the fusion of legacy classifiers

Abstract For important classification tasks there may already be extant an arsenal of classification tools, these representing previous attempts and best efforts at a solution. Many times these are useful classifiers; and although the fact that all base their decisions on the same observations implies that their decisions are strongly dependent in a way that is difficult to model, there is often some benefit from fusing them to a better corporate decision. One can consider this fusion as of building a meta-classifier, based on data vectors whose elements are the individual legacy classifier (LC) decisions. The Bayesian data reduction algorithm imposes a uniform prior probability mass function on discrete symbol probabilities. It was developed previously, and in this paper is applied to the preceding decision-fusion problem, with favorable comparison to a number of other expert-mixing approaches. Parameters varied include the number of relevant LCs (some may have been poorly designed, and ought to be discounted/discarded automatically), the numbers of training data and classes, and the dependence between LCs––a fusion approach should reject redundant decisions.

[1]  Nageswara S. V. Rao Distributed decision fusion using empirical estimation , 1997 .

[2]  W. Waller,et al.  On the monotonicity of the performance of Bayesian classifiers (Corresp.) , 1978, IEEE Trans. Inf. Theory.

[3]  David A. Landgrebe,et al.  Covariance Matrix Estimation and Classification With Limited Training Data , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[4]  Anil K. Jain,et al.  Feature Selection: Evaluation, Application, and Small Sample Performance , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[5]  George E. Kokolakis Bayesian classification and classification performance for independent distributions , 1981, IEEE Trans. Inf. Theory.

[6]  Guoqiang Peter Zhang,et al.  Neural networks for classification: a survey , 2000, IEEE Trans. Syst. Man Cybern. Part C.

[7]  Robert P. W. Duin,et al.  The mean recognition performance for independent distributions (Corresp.) , 1978, IEEE Trans. Inf. Theory.

[8]  Peter Willett,et al.  Performance considerations for a combined information classification test using Dirichlet priors , 1999, IEEE Trans. Signal Process..

[9]  Jiri Matas,et al.  On Combining Classifiers , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[10]  Chi Hau Chen,et al.  Pattern recognition and signal processing , 1978 .

[11]  Josef Kittler,et al.  Multiple expert system design by combined feature selection and probability level fusion , 2000, Proceedings of the Third International Conference on Information Fusion.

[12]  Demetrios Kazakos,et al.  Quantization complexity and training sample size in detection , 1978, IEEE Trans. Inf. Theory.

[13]  Keinosuke Fukunaga,et al.  Statistical Pattern Recognition , 1993, Handbook of Pattern Recognition and Computer Vision.

[14]  Anil K. Jain,et al.  Small Sample Size Effects in Statistical Pattern Recognition: Recommendations for Practitioners , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[15]  Eric Bauer,et al.  An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants , 1999, Machine Learning.

[16]  Kevin W. Bowyer,et al.  Combination of Multiple Classifiers Using Local Accuracy Estimates , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[17]  Robert P. W. Duin,et al.  On the evaluation of independent binary features (Corresp.) , 1978, IEEE Trans. Inf. Theory.

[18]  B. Chandrasekaran,et al.  Independence of measurements and the mean recognition accuracy , 1971, IEEE Trans. Inf. Theory.

[19]  Peter Willett,et al.  Bayesian classification and feature reduction using uniform Dirichlet priors , 2003, IEEE Trans. Syst. Man Cybern. Part B.

[20]  G. F. Hughes,et al.  On the mean accuracy of statistical pattern recognizers , 1968, IEEE Trans. Inf. Theory.

[21]  Jan M. Van Campenhout On the Peaking of the Hughes Mean Recognition Accuracy: The Resolution of an Apparent Paradox , 1978, IEEE Trans. Syst. Man Cybern..

[22]  Keinosuke Fukunaga,et al.  Effects of Sample Size in Classifier Design , 1989, IEEE Trans. Pattern Anal. Mach. Intell..

[23]  Peter Willett,et al.  Classification and feature selection with fused conditionally dependent binary valued features , 2000, Proceedings of the Third International Conference on Information Fusion.