Bayesian applications of belief networks and multilayer perceptrons for ovarian tumor classification with rejection

Incorporating prior knowledge into black-box classifiers is still much of an open problem. We propose a hybrid Bayesian methodology that consists in encoding prior knowledge in the form of a (Bayesian) belief network and then using this knowledge to estimate an informative prior for a black-box model (e.g. a multilayer perceptron). Two technical approaches are proposed for the transformation of the belief network into an informative prior. The first one consists in generating samples according to the most probable parameterization of the Bayesian belief network and using them as virtual data together with the real data in the Bayesian learning of a multilayer perceptron. The second approach consists in transforming probability distributions over belief network parameters into distributions over multilayer perceptron parameters. The essential attribute of the hybrid methodology is that it combines prior knowledge and statistical data efficiently when prior knowledge is available and the sample is of small or medium size. Additionally, we describe how the Bayesian approach can provide uncertainty information about the predictions (e.g. for classification with rejection). We demonstrate these techniques on the medical task of predicting the malignancy of ovarian masses and summarize the practical advantages of the Bayesian approach. We compare the learning curves for the hybrid methodology with those of several belief networks and multilayer perceptrons. Furthermore, we report the performance of Bayesian belief networks when they are allowed to exclude hard cases based on various measures of prediction uncertainty.

[1]  Dirk Timmerman,et al.  Domain knowledge based information retrieval language: an application of annotated Bayesian networks in ovarian cancer domain , 2002, Proceedings of 15th IEEE Symposium on Computer-Based Medical Systems (CBMS 2002).

[2]  Stefan M. Rüger,et al.  Clustering in Weight Space of Feedforward Nets , 1996, ICANN.

[3]  Bart De Moor,et al.  Using literature and data to learn Bayesian networks as clinical models of ovarian tumors , 2004, Artif. Intell. Medicine.

[4]  L. Valentin,et al.  Gray scale sonography, subjective evaluation of the color Doppler image and measurement of blood flow velocity for distinguishing benign and malignant tumors of suspected adnexal origin. , 1997, European journal of obstetrics, gynecology, and reproductive biology.

[5]  P. Lavin,et al.  Comparison of Serum CA 125, Clinical Impression, and Ultrasound in the Preoperative Evaluation of Ovarian Masses , 1988, Obstetrics and gynecology.

[6]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems , 1988 .

[7]  Jude W. Shavlik,et al.  Knowledge-Based Artificial Neural Networks , 1994, Artif. Intell..

[8]  Russell Greiner,et al.  Bayesian Error-Bars for Belief Net Inference , 2001, UAI.

[9]  A S Whittemore,et al.  Characteristics relating to ovarian cancer risk: collaborative analysis of 12 US case-control studies. III. Epithelial tumors of low malignant potential in white women. Collaborative Ovarian Cancer Group. , 1992, American journal of epidemiology.

[10]  David Maxwell Chickering,et al.  Learning Bayesian Networks: The Combination of Knowledge and Statistical Data , 1994, Machine Learning.

[11]  Seth Granberg,et al.  Macroscopic characterization of ovarian tumors and the relation to the histological diagnosis: Criteria to be used for ultrasound evaluation , 1990 .

[12]  Tomaso Poggio,et al.  Incorporating prior information in machine learning by creating virtual examples , 1998, Proc. IEEE.

[13]  David Haussler,et al.  Quantifying Inductive Bias: AI Learning Algorithms and Valiant's Learning Framework , 1988, Artif. Intell..

[14]  Michael Luby,et al.  Approximating Probabilistic Inference in Bayesian Belief Networks is NP-Hard , 1993, Artif. Intell..

[15]  Nir Friedman,et al.  Being Bayesian about Network Structure , 2000, UAI.

[16]  David J. Spiegelhalter,et al.  Local computations with probabilities on graphical structures and their application to expert systems , 1990 .

[17]  D. Easton,et al.  Breast and ovarian cancer incidence in BRCA1-mutation carriers. Breast Cancer Linkage Consortium. , 1995, American journal of human genetics.

[18]  Peter J. F. Lucas,et al.  Restricted Bayesian Network Structure Learning , 2002, Probabilistic Graphical Models.

[19]  Sanjoy Dasgupta,et al.  The Sample Complexity of Learning Fixed-Structure Bayesian Networks , 1997, Machine Learning.

[20]  David B. Dunson,et al.  Bayesian Data Analysis , 2010 .

[21]  Gregory F. Cooper,et al.  A Bayesian method for the induction of probabilistic networks from data , 1992, Machine Learning.

[22]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[23]  Philip Smith Roger Bakeman John M. Gottman , 1987, Animal Behaviour.

[24]  Peter Antal,et al.  On the potential of domain literature for clustering and Bayesian network learning , 2002, KDD.

[25]  László Györfi,et al.  A Probabilistic Theory of Pattern Recognition , 1996, Stochastic Modelling and Applied Probability.

[26]  A S Whittemore,et al.  Prevalence and contribution of BRCA1 mutations in breast cancer and ovarian cancer: results from three U.S. population-based case-control studies of ovarian cancer. , 1997, American journal of human genetics.

[27]  Russell Greiner,et al.  Learning Bayesian Belief Network Classifiers: Algorithms and System , 2001, Canadian Conference on AI.

[28]  Petri Myllymäki,et al.  Mapping Bayesian Networks to Stochastic Neural Networks : A Foundation for Hybrid Bayesian-Neural Systems , 1995 .

[29]  Adnan Darwiche,et al.  Inference in belief networks: A procedural guide , 1996, Int. J. Approx. Reason..

[30]  Jacek M. Zurada,et al.  Knowledge-based neurocomputing , 2000 .

[31]  Peter Müller,et al.  Issues in Bayesian Analysis of Neural Network Models , 1998, Neural Computation.

[32]  Jonathan S. Berek,et al.  Practical gynecologic oncology , 1989 .

[33]  David Haussler,et al.  Learnability and the Vapnik-Chervonenkis dimension , 1989, JACM.

[34]  Geoffrey E. Hinton,et al.  Bayesian Learning for Neural Networks , 1995 .

[35]  David J. Hand,et al.  Construction and Assessment of Classification Rules , 1997 .

[36]  Dirk Timmerman Ultrasonography in the assessment of ovarian and tamoxifen-associated endometrial pathology , 1997 .

[37]  Roger Bakeman,et al.  Observing Interaction: An Introduction to Sequential Analysis , 1986 .

[38]  Nir Friedman,et al.  Bayesian Network Classifiers , 1997, Machine Learning.

[39]  Sabine Van Huffel,et al.  Bayesian networks in ovarian cancer diagnosis: potentials and limitations , 2000, Proceedings 13th IEEE Symposium on Computer-Based Medical Systems. CBMS 2000.

[40]  Nir Friedman,et al.  On the Sample Complexity of Learning Bayesian Networks , 1996, UAI.

[41]  Jude Shavlik An Overview of Research at Wisconsin on Knowledge-Based Neural Networks , 1996 .

[42]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[43]  P. Antal,et al.  Construction of a classifier with prior domain knowledge formalised as Bayesian network , 1998, IECON '98. Proceedings of the 24th Annual Conference of the IEEE Industrial Electronics Society (Cat. No.98CH36200).

[44]  Joos Vandewalle,et al.  Incorporation of Prior Knowledge in Black-box Models : Comparison of Transformation Methods from Bayesian Network to Multilayer Perceptrons , 2000, UAI 2000.

[45]  David Heckerman,et al.  A Characterization of the Dirichlet Distribution with Application to Learning Bayesian Networks , 1995, UAI.

[46]  J Vandewalle,et al.  Artificial neural network models for the preoperative discrimination between malignant and benign adnexal masses , 1999, Ultrasound in obstetrics & gynecology : the official journal of the International Society of Ultrasound in Obstetrics and Gynecology.

[47]  Elie Bienenstock,et al.  Neural Networks and the Bias/Variance Dilemma , 1992, Neural Computation.

[48]  T. Bourne,et al.  Terms, definitions and measurements to describe the sonographic features of adnexal tumors: a consensus opinion from the International Ovarian Tumor Analysis (IOTA) group , 2000, Ultrasound in obstetrics & gynecology : the official journal of the International Society of Ultrasound in Obstetrics and Gynecology.

[49]  N Risch,et al.  Autosomal dominant inheritance of early‐onset breast cancer. Implications for risk prediction , 1994, Cancer.

[50]  P Jouppila,et al.  Validity of pulsatility and resistance indices in classification of adnexal tumors with transvaginal color Doppler ultrasound , 1992, Ultrasound in obstetrics & gynecology : the official journal of the International Society of Ultrasound in Obstetrics and Gynecology.

[51]  Radford M. Neal Transferring Prior Information Between Models Using Imaginary Data , 2001 .

[52]  Satoru Miyano,et al.  Challenges for Intelligent Systems in Biology , 2001, IEEE Intell. Syst..

[53]  J. Hanley,et al.  The meaning and use of the area under a receiver operating characteristic (ROC) curve. , 1982, Radiology.

[54]  Yaser S. Abu-Mostafa,et al.  Hints and the VC Dimension , 1993, Neural Computation.

[55]  E A Ostrander,et al.  Hereditary ovarian cancer. , 1997, Current opinion in obstetrics & gynecology.

[56]  R. Schapire,et al.  Bounds on the sample complexity of Bayesian learning using information theory and the VC dimension , 1991, COLT '91.

[57]  Raymond J. Mooney,et al.  Theory Refinement of Bayesian Networks with Hidden Variables , 1998, ICML.

[58]  J Halpern,et al.  Characteristics relating to ovarian cancer risk: collaborative analysis of 12 US case-control studies. I. Methods. Collaborative Ovarian Cancer Group. , 1992, American journal of epidemiology.

[59]  Brian D. Ripley,et al.  Stochastic Simulation , 2005 .

[60]  David J. Spiegelhalter,et al.  Bayesian analysis in expert systems , 1993 .

[61]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[62]  Tom M. Mitchell,et al.  Does Machine Learning Really Work? , 1997, AI Mag..