论文信息 - Cost functions to estimate a posteriori probabilities in multiclass problems

Cost functions to estimate a posteriori probabilities in multiclass problems

The problem of designing cost functions to estimate a posteriori probabilities in multiclass problems is addressed in this paper. We establish necessary and sufficient conditions that these costs must satisfy in one-class one-output networks whose outputs are consistent with probability laws. We focus our attention on a particular subset of the corresponding cost functions; those which verify two usually interesting properties: symmetry and separability (well-known cost functions, such as the quadratic cost or the cross entropy are particular cases in this subset). Finally, we present a universal stochastic gradient learning rule for single-layer networks, in the sense of minimizing a general version of these cost functions for a wide family of nonlinear activation functions.

[1] Richard O. Duda,et al. Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[2] A. Gualtierotti. H. L. Van Trees, Detection, Estimation, and Modulation Theory, , 1976 .

[3] S. Haykin,et al. Adaptive Filter Theory , 1986 .

[4] J. J. Hopfield,et al. Learning algorithms andprobability distributions infeed-forward andfeed-back networks , 1987 .

[5] John S. Denker,et al. Strategies for Teaching Layered Networks Classification Tasks , 1987, NIPS.

[6] Geoffrey E. Hinton. Connectionist Learning Procedures , 1989, Artif. Intell..

[7] Eric A. Wan,et al. Neural network classification: a Bayesian interpretation , 1990, IEEE Trans. Neural Networks.

[8] Amro El-Jaroudi,et al. A new error criterion for posterior probability estimation with neural nets , 1990, 1990 IJCNN International Joint Conference on Neural Networks.

[9] Bruce W. Suter,et al. The multilayer perceptron as an approximation to a Bayes optimal discriminant function , 1990, IEEE Trans. Neural Networks.

[10] Simon Haykin,et al. Adaptive filter theory (2nd ed.) , 1991 .

[11] Barak A. Pearlmutter,et al. Equivalence Proofs for Multi-Layer Perceptron Classifiers and the Bayesian Discriminant Function , 1991 .

[12] Richard Lippmann,et al. Neural Network Classifiers Estimate Bayesian a posteriori Probabilities , 1991, Neural Computation.

[13] Paul W. Munro,et al. Repeat Until Bored: A Pattern Selection Strategy , 1991, NIPS.

[14] Padhraic Smyth,et al. Objective functions for probability estimation , 1991, IJCNN-91-Seattle International Joint Conference on Neural Networks.

[15] J. N. Kapur,et al. Entropy optimization principles with applications , 1992 .

[16] H. Szu,et al. Implementing the minimum-misclassification-error energy function for target recognition , 1992, [Proceedings 1992] IJCNN International Joint Conference on Neural Networks.

[17] Shun-ichi Amari,et al. Backpropagation and stochastic gradient descent method , 1993, Neurocomputing.

[18] Stephen I. Gallant,et al. Neural network learning and expert systems , 1993 .

[19] John L. Wyatt,et al. The Softmax Nonlinearity: Derivation Using Statistical Mechanics and Useful Properties as a Multiterminal Analog Circuit Element , 1993, NIPS.

[20] Mohamed I. Elmasry,et al. VLSI Artificial Neural Networks Engineering , 1994 .

[21] Brian A. Telfer,et al. Energy functions for minimizing misclassification error with minimum-complexity networks , 1994, Neural Networks.

[22] Christian Cachin,et al. Pedagogical pattern selection strategies , 1994, Neural Networks.

[23] Sun-Yuan Kung,et al. Decision-based neural networks with signal/image classification applications , 1995, IEEE Trans. Neural Networks.

[24] Thomas Kailath,et al. Classification of linearly nonseparable patterns by linear threshold elements , 1995, IEEE Trans. Neural Networks.

[25] Perambur S. Neelakanta. Csiszar's Generalized Error Measures for Gradient-descent-based Optimizations in Neural Networks Using the Backpropagation Algorithm , 1996, Connect. Sci..

[26] Amro El-Jaroudi,et al. A method of generating objective functions for probability estimation , 1996 .

[27] Jesús Cid-Sueiro,et al. Digital Equalization Using Modular Neural Networks: an Overview , 1996 .

[28] Xiao Liu,et al. Conditional distribution learning with neural networks and its application to channel equalization , 1997, IEEE Trans. Signal Process..

[29] Bernhard Schölkopf,et al. Comparing support vector machines with Gaussian kernels to radial basis function classifiers , 1997, IEEE Trans. Signal Process..

[30] Vladimir N. Vapnik,et al. The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.