Data Mining and Model Simplicity: A Case Study in Diagnosis

We describe the results of performing data mining on a challenging medical diagnosis domain, acute abdominal pain. This domain is well known to be difficult, yielding little more than 60% predictive accuracy for most human and machine diagnosticians. Moreover, many researchers argue that one of the simplest approaches, the naive Bayesian classifier, is optimal. By comparing the performance of the naive Bayesian classifier to its more general cousin, the Bayesian network classifier, and to selective Bayesian classifiers with just 10% of the total attributes, we show that the simplest models perform at least as well as the more complex models. We argue that simple models like the selective naive Bayesian classifier will perform as well as more complicated models for similarly complex domains with relatively small data sets, thereby calling into question the extra expense necessary to induce more complex models.

[1]  Gregory Provan,et al.  Tradeoffs in Knowledge-Based Construction of Probabilistic Models , 1994, IEEE Trans. Syst. Man Cybern. Syst..

[2]  G. Provan Eecient Learning of Selective Bayesian Network Classiiers , 1996 .

[3]  D. G. Swain Computer aided diagnosis of acute abdominal pain , 1986 .

[4]  M J Norusis,et al.  Diagnosis. I. Symptom nonindependence in mathematical models for diagnosis. , 1975, Computers and biomedical research, an international journal.

[5]  P. Langley,et al.  An Analysis of Bayesian Classifiers , 1992, AAAI.

[6]  Gregory M. Provan,et al.  A Comparison of Induction Algorithms for Selective and non-Selective Bayesian Classifiers , 1995, ICML.

[7]  Edwards Fh,et al.  Use of a Bayesian algorithm in the computer-assisted diagnosis of appendicitis. , 1984 .

[8]  B S Todd,et al.  The Relative Accuracy of a Variety of Medical Diagnostic Programs , 1994, Methods of Information in Medicine.

[9]  F H Edwards,et al.  Use of a Bayesian algorithm in the computer-assisted diagnosis of appendicitis. , 1984, Surgery, gynecology & obstetrics.

[10]  F T De Dombal The diagnosis of acute abdominal pain with computer assistance: worldwide perspective. , 1991, Annales de chirurgie.

[11]  Pedro M. Domingos,et al.  Beyond Independence: Conditions for the Optimality of the Simple Bayesian Classifier , 1996, ICML.

[12]  G. Sutton Computer aided diagnosis of acute abdominal pain , 1986, British medical journal.

[13]  B Séroussi,et al.  Computer-aided Diagnosis of Acute Abdominal Pain when Taking into Account Interactions , 1986, Methods of Information in Medicine.

[14]  Kristian G. Olesen,et al.  HUGIN - A Shell for Building Bayesian Belief Universes for Expert Systems , 1989, IJCAI.

[15]  A. Hasman,et al.  Probabilistic reasoning in intelligent systems: Networks of plausible inference , 1991 .

[16]  Pat Langley,et al.  An Analysis of Bayesian Classifiers , 1992, AAAI.

[17]  Gregory F. Cooper,et al.  The Computational Complexity of Probabilistic Inference Using Bayesian Belief Networks , 1990, Artif. Intell..

[18]  David J. Spiegelhalter,et al.  Local computations with probabilities on graphical structures and their application to expert systems , 1990 .

[19]  Gregory M. Provan,et al.  Learning Bayesian Networks Using Feature Selection , 1995, AISTATS.

[20]  Gregory M. Provan,et al.  Efficient Learning of Selective Bayesian Network Classifiers , 1996, ICML.

[21]  J. Hilden Statistical diagnosis based on conditional independence does not require it. , 1984, Computers in biology and medicine.