ESTIMATION OF POSTERIOR PROBABILITIES WITH NEURAL NETWORKS : APPLICATION TO MICROCALCIFICATION DETECTION IN BREAST CANCER DIAGNOSIS

Neural networks (NNs) are customarily used as classifiers aimed at minimizing classification error rates. However, it is known that the NN architectures that compute soft decisions can be used to estimate posterior class probabilities; sometimes, it could be useful to implement general decision rules other than the maximum a posteriori (MAP) decision criterion. In addition, probabilities provide a confidence measure of the classifier decisions, a fact that is essential in applications in which a high risk is involved. This chapter is devoted to the general problem of estimating posterior class probabilities using NNs. Two components of the estimation problem are discussed: Model selection, on one side, and parameter learning, on the other. The analysis assumes an NN model called the generalized softmax perceptron (GSP), although most of the discussion can be easily extended to other schemes, such as the hierarchical mixture of experts (HME) [1], which has inspired part of our work, or even the well-known multilayer perceptron. The use of posterior probability estimates is applied in this chapter to a medical decision support system; the testbed used is the detection of microcalcifications (MCCs) in mammograms, which is a key step in breast cancer early diagnosis. The chapter is organized as follows: Section 3.2 discusses the estimation of posterior class probabilities with NNs, with emphasis in a medical application; Section 3.3 discusses learning and model selection algorithms for the GSP networks; Section 3.4 proposes a system for MCC detection based on the GSP; Section 3.5 shows some simulation results on detection performance using a mammogram database; and Section 3.6 provides some conclusions and future trends.

[1]  Steven Kay,et al.  Fundamentals Of Statistical Signal Processing , 2001 .

[2]  S. Kay Fundamentals of statistical signal processing: estimation theory , 1993 .

[3]  A. Berger FUNDAMENTALS OF BIOSTATISTICS , 1969 .

[4]  J. I. Arribas,et al.  A Radius and Ulna Skeletal Age Assessment System , 2005, 2005 IEEE Workshop on Machine Learning for Signal Processing.

[5]  J. Cid-Sueiro,et al.  Neural architectures for parametric estimation of a posteriori probabilities by constrained conditional density functions , 1999, Neural Networks for Signal Processing IX: Proceedings of the 1999 IEEE Signal Processing Society Workshop (Cat. No.98TH8468).

[6]  Padhraic Smyth,et al.  On loss functions which minimize to conditional expected values and posterior proba- bilities , 1993, IEEE Trans. Inf. Theory.

[7]  Bruce W. Suter,et al.  The multilayer perceptron as an approximation to a Bayes optimal discriminant function , 1990, IEEE Trans. Neural Networks.

[8]  Domingo Docampo,et al.  Growing Gaussian mixtures network for classification applications , 1999, Signal Process..

[9]  Robert A. Jacobs,et al.  Hierarchical Mixtures of Experts and the EM Algorithm , 1993, Neural Computation.

[10]  Russell Reed,et al.  Pruning algorithms-a survey , 1993, IEEE Trans. Neural Networks.

[11]  Jesús Cid-Sueiro,et al.  Cost functions to estimate a posteriori probabilities in multiclass problems , 1999, IEEE Trans. Neural Networks.

[12]  Vladimir Vapnik,et al.  An overview of statistical learning theory , 1999, IEEE Trans. Neural Networks.

[13]  Alexander H. Waibel,et al.  Adaptively Growing Hierarchical Mixtures of Experts , 1996, NIPS.

[14]  Shun-ichi Amari,et al.  Backpropagation and stochastic gradient descent method , 1993, Neurocomputing.

[15]  F. Winsberg,et al.  Detection of Radiographic Abnormalities in Mammograms by Means of Optical Scanning and Computer Analysis , 1967 .

[16]  Ke Chen,et al.  Improved learning algorithms for mixture of experts in multiclass classification , 1999, Neural Networks.

[17]  Klaus-Peter Adlassnig,et al.  Fuzzy systems in medicine , 2001, EUSFLAT Conf..

[18]  Brian A. Telfer,et al.  Energy functions for minimizing misclassification error with minimum-complexity networks , 1994, Neural Networks.

[19]  Amro El-Jaroudi,et al.  A new error criterion for posterior probability estimation with neural nets , 1990, 1990 IJCNN International Joint Conference on Neural Networks.

[20]  Maryellen L. Giger,et al.  Ideal observer approximation using Bayesian classification neural networks , 2001, IEEE Transactions on Medical Imaging.

[21]  Thomas L. Marzetta,et al.  Detection, Estimation, and Modulation Theory , 1976 .

[22]  Geoffrey E. Hinton,et al.  An Alternative Model for Mixtures of Experts , 1994, NIPS.

[23]  D. A. Bell,et al.  Information Theory and Reliable Communication , 1969 .

[24]  Baoyu Zheng,et al.  Digital mammography: mixed feature neural network with spectral entropy decision for detection of microcalcifications , 1996, IEEE Trans. Medical Imaging.

[25]  Eric A. Wan,et al.  Neural network classification: a Bayesian interpretation , 1990, IEEE Trans. Neural Networks.

[26]  Eric B. Baum,et al.  Supervised Learning of Probability Distributions by Neural Networks , 1987, NIPS.

[27]  Jesús Cid-Sueiro,et al.  A model selection algorithm for a posteriori probability estimation with neural networks , 2005, IEEE Transactions on Neural Networks.