Computer-based probabilistic-network construction

Faced with increasing amounts of data that they cannot analyze manually, biomedical researchers have turned increasingly to computational methods for exploring large databases. In particular, researchers might benefit from a nonparametric, efficient, computer-based method for determining the important associations among variables in a domain, particularly when human expertise is not readily available. In this dissertation, I demonstrate that such computer-based algorithms are conceptually feasible, robust to noise, computationally efficient, theoretically sound, and that they generate models that can classify new cases accurately. I first describe two algorithms that take as input a database of cases and optional user-supplied prior knowledge, and that generate a probabilistic network--in particular, a belief network--as output. The database may have incomplete data, and may contain noise. The resulting belief network may be used to determine important associations among variables in a poorly understood domain, or may be used as a classifier for new cases that were not used in learning. After describing the algorithms, I present simple examples of how these programs generate a belief network from a database. I then present the results of evaluating these algorithms on databases from several domains, including gynecologic pathology, lymph-node pathology, DNA-sequence analysis, and poisonous-mushroom classification. In most cases, the belief networks classify new test cases with high accuracy. In addition to discussing empirical results, I present an overview of proofs that these algorithms are based on metrics that will, as the number of cases in the database increases without limit, always prefer those networks that more closely approximate the true underlying distribution of the data in the database; that is, these algorithms are asymptotically correct. I conclude with a discussion of this work's contributions, and with a list of open research problems.

[1]  Thomas G. Dietterich,et al.  Learning to Predict Sequences , 1985 .

[2]  S. Zabell W. E. Johnson's "Sufficientness" Postulate , 1982 .

[3]  Thomas G. Dietterich,et al.  A Comparative Review of Selected Methods for Learning from Examples , 1983 .

[4]  C. Villegas,et al.  On the Representation of Ignorance , 1977 .

[5]  Donald A. Waterman,et al.  Pattern-Directed Inference Systems , 1981, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[7]  Frank Rosenblatt,et al.  PRINCIPLES OF NEURODYNAMICS. PERCEPTRONS AND THE THEORY OF BRAIN MECHANISMS , 1963 .

[8]  Gregory F. Cooper,et al.  The Computational Complexity of Probabilistic Inference Using Bayesian Belief Networks , 1990, Artif. Intell..

[9]  Douglas B. Lenat,et al.  The Role of Heuristics in Learning by Discovery: Three Case Studies , 1983 .

[10]  V. Cerný Thermodynamical approach to the traveling salesman problem: An efficient simulation algorithm , 1985 .

[11]  R L Blum,et al.  Discovery, confirmation, and incorporation of causal relationships from a large time-oriented clinical data base: the RX project. , 1982, Computers and biomedical research, an international journal.

[12]  H. Heyer,et al.  Information and Sufficiency , 1982 .

[13]  W. Edwards,et al.  Decision Analysis and Behavioral Research , 1986 .

[14]  Judea Pearl,et al.  A Theory of Inferred Causation , 1991, KR.

[15]  Ronald L. Rivest,et al.  A non-iterative maximum entropy algorithm , 1986, UAI.

[16]  David J. Spiegelhalter,et al.  Sequential updating of conditional probabilities on directed graphical structures , 1990, Networks.

[17]  C. N. Liu,et al.  Approximating discrete probability distributions with dependence trees , 1968, IEEE Trans. Inf. Theory.

[18]  William F. Eddy,et al.  Statistical computing in Pascal , 1985 .

[19]  Tom M. Mitchell,et al.  Learning by experimentation: acquiring and refining problem-solving heuristics , 1993 .

[20]  D. Heckerman,et al.  Toward Normative Expert Systems: Part I The Pathfinder Project , 1992, Methods of Information in Medicine.

[21]  Philip M. Lewis,et al.  Approximating Probability Distributions to Reduce Storage Requirements , 1959, Information and Control.

[22]  J. Fries,et al.  ARAMIS (the American Rheumatism Association Medical Information System). A prototypical national chronic-disease data bank. , 1986, The Western journal of medicine.

[23]  J. Berger Statistical Decision Theory and Bayesian Analysis , 1988 .

[24]  Alice M. Agogino,et al.  Automated Construction of Sparse Bayesian Networks from Unstructured Probabilistic Models and Domain Information , 2013, UAI.

[25]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[26]  William B. Gevarter,et al.  Automatic probabilistic knowledge acquisition from data , 1987, 1987 IEEE Third International Conference on Data Engineering.

[27]  Tom Michael Mitchell Version spaces: an approach to concept learning. , 1979 .

[28]  Halbert White,et al.  Learning in Artificial Neural Networks: A Statistical Perspective , 1989, Neural Computation.

[29]  Gregory F. Cooper,et al.  An algorithm for computing probabilistic propositions , 1987, Int. J. Approx. Reason..

[30]  Peter C. Cheeseman,et al.  A Method of Computing Generalized Bayesian Probability Values for Expert Systems , 1983, IJCAI.

[31]  Douglas B. Lenat,et al.  AM, an artificial intelligence approach to discovery in mathematics as heuristic search , 1976 .

[32]  Barr and Feigenbaum Edward A. Avron The Handbook of Artificial Intelligence , 1981 .

[33]  David J. Mostow,et al.  Machine Transformation of Advice Into a Heuristic Search Procedure , 1983 .

[34]  Alan Bundy,et al.  Proceedings of the Eighth International Joint Conference on Artificial Intelligence , 1983 .

[35]  E. Jaynes On the rationale of maximum-entropy methods , 1982, Proceedings of the IEEE.

[36]  Jeffrey D. Ullman,et al.  Introduction to Automata Theory, Languages and Computation , 1979 .

[37]  T. Speed,et al.  Recursive causal models , 1984, Journal of the Australian Mathematical Society. Series A. Pure Mathematics and Statistics.

[38]  H. Simon,et al.  Rediscovering Chemistry with the Bacon System , 1983 .

[39]  R. Dawes,et al.  Heuristics and Biases: Clinical versus Actuarial Judgment , 2002 .

[40]  M. Tribus Rational descriptions, decisions, and designs , 1969 .

[41]  G. O. Stone,et al.  An analysis of the delta rule and the learning of statistical associations , 1986 .

[42]  Melvin R. Novick,et al.  A Bayesian Indifference Procedure , 1965 .

[43]  Seymour Geisser,et al.  On Prior Distributions for Binary Trials , 1984 .

[44]  J. Bernardo Reference Posterior Distributions for Bayesian Inference , 1979 .

[45]  Kevin T. Kelly,et al.  Discovering Causal Structure. , 1989 .

[46]  P. Werbos,et al.  Beyond Regression : "New Tools for Prediction and Analysis in the Behavioral Sciences , 1974 .

[47]  P. Spirtes,et al.  An Algorithm for Fast Recovery of Sparse Causal Graphs , 1991 .

[48]  R. W. Robinson Counting unlabeled acyclic digraphs , 1977 .

[49]  Judea Pearl,et al.  Equivalence and Synthesis of Causal Models , 1990, UAI.

[50]  C. E. SHANNON,et al.  A mathematical theory of communication , 1948, MOCO.

[51]  Aleksandr Yakovlevich Khinchin,et al.  Mathematical foundations of information theory , 1959 .

[52]  S. Kullback,et al.  The Information in Contingency Tables , 1980 .

[53]  Stuart L. Crawford,et al.  Constructor: A System for the Induction of Probabilistic Models , 1990, AAAI.

[54]  Solomon Kullback,et al.  Approximating discrete probability distributions , 1969, IEEE Trans. Inf. Theory.

[55]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[56]  Ryszard S. Michalski,et al.  Machine learning: an artificial intelligence approach volume III , 1990 .

[57]  H. Akaike A new look at the Bayes procedure , 1978 .

[58]  Bruce G. Buchanan,et al.  The MYCIN Experiments of the Stanford Heuristic Programming Project , 1985 .

[59]  F. T. de Dombal,et al.  Computer-Assisted Diagnosis of Abdominal Pain using “Estimates” Provided by Clinicians , 1972, British medical journal.

[60]  David G. Luenberger,et al.  Linear and nonlinear programming , 1984 .

[61]  David J. Spiegelhalter,et al.  Local computations with probabilities on graphical structures and their application to expert systems , 1990 .

[62]  Gregory F. Cooper,et al.  The ALARM Monitoring System: A Case Study with two Probabilistic Inference Techniques for Belief Networks , 1989, AIME.

[63]  Richard A. Johnson,et al.  Applied Multivariate Statistical Analysis , 1983 .

[64]  Michael G. Walker,et al.  How Feasible Is Automated Discovery? , 1987, IEEE Expert.