On the structure of strict sense Bayesian cost functions and its applications

In the context of classification problems, the paper analyzes the general structure of the strict sense Bayesian (SSB) cost functions, those having a unique minimum when the soft decisions are equal to the posterior class probabilities. We show that any SSB cost is essentially the sum of a generalized measure of entropy, which does not depend on the targets, and an error component. Symmetric cost functions are analyzed in detail. Our results provide a further insight on the behavior of this family of objective functions and are the starting point for the exploration of novel algorithms. Two applications are proposed. First, the use of asymmetric SSB cost functions for posterior probability estimation in non-maximum a posteriori (MAP) decision problems. Second, a novel entropy minimization principle for hybrid learning: use labeled data to minimize the cost function, and unlabeled data to minimize the corresponding entropy measure.

[1]  Michael I. Jordan,et al.  Hierarchical Mixtures of Experts and the EM Algorithm , 1994, Neural Computation.

[2]  David A. Landgrebe,et al.  Partially supervised classification using weighted unsupervised clustering , 1999, IEEE Trans. Geosci. Remote. Sens..

[3]  Barak A. Pearlmutter,et al.  Equivalence Proofs for Multi-Layer Perceptron Classifiers and the Bayesian Discriminant Function , 1991 .

[4]  Bruce W. Suter,et al.  The multilayer perceptron as an approximation to a Bayes optimal discriminant function , 1990, IEEE Trans. Neural Networks.

[5]  J. Cid-Sueiro,et al.  Neural architectures for parametric estimation of a posteriori probabilities by constrained conditional density functions , 1999, Neural Networks for Signal Processing IX: Proceedings of the 1999 IEEE Signal Processing Society Workshop (Cat. No.98TH8468).

[6]  J. N. Kapur,et al.  Entropy optimization principles with applications , 1992 .

[7]  Marco Saerens,et al.  Non Mean Square Error Criteria for the Training of Learning Machines , 1996, ICML.

[8]  Shun-ichi Amari,et al.  Backpropagation and stochastic gradient descent method , 1993, Neurocomputing.

[9]  Michael I. Jordan,et al.  Learning from Incomplete Data , 1994 .

[10]  Lionel Tarassenko,et al.  Guide to Neural Computing Applications , 1998 .

[11]  Amro El-Jaroudi,et al.  A new error criterion for posterior probability estimation with neural nets , 1990, 1990 IJCNN International Joint Conference on Neural Networks.

[12]  Domingo Docampo,et al.  Growing Gaussian mixtures network for classification applications , 1999, Signal Process..

[13]  Richard Lippmann,et al.  Neural Network Classifiers Estimate Bayesian a posteriori Probabilities , 1991, Neural Computation.

[14]  Jesús Cid-Sueiro,et al.  Cost functions to estimate a posteriori probabilities in multiclass problems , 1999, IEEE Trans. Neural Networks.

[15]  Amro El-Jaroudi,et al.  A method of generating objective functions for probability estimation , 1996 .

[16]  Joseph L Schafer,et al.  Analysis of Incomplete Multivariate Data , 1997 .

[17]  G. McLachlan,et al.  The EM algorithm and extensions , 1996 .

[18]  Padhraic Smyth,et al.  On loss functions which minimize to conditional expected values and posterior proba- bilities , 1993, IEEE Trans. Inf. Theory.

[19]  Vittorio Castelli,et al.  The relative value of labeled and unlabeled samples in pattern recognition with an unknown mixing parameter , 1996, IEEE Trans. Inf. Theory.

[20]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[21]  David J. Miller,et al.  A Mixture of Experts Classifier with Learning Based on Both Labelled and Unlabelled Data , 1996, NIPS.

[22]  David A. Landgrebe,et al.  The effect of unlabeled samples in reducing the small sample size problem and mitigating the Hughes phenomenon , 1994, IEEE Trans. Geosci. Remote. Sens..

[23]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[24]  David G. Messerschmitt,et al.  Digital Communications , 1995 .

[25]  Eric A. Wan,et al.  Neural network classification: a Bayesian interpretation , 1990, IEEE Trans. Neural Networks.

[26]  Okan K. Ersoy,et al.  Classification accuracy improvement of neural network classifiers by using unlabeled data , 1998, IEEE Trans. Geosci. Remote. Sens..