论文信息 - Computer-based probabilistic-network construction

Computer-based probabilistic-network construction

Faced with increasing amounts of data that they cannot analyze manually, biomedical researchers have turned increasingly to computational methods for exploring large databases. In particular, researchers might benefit from a nonparametric, efficient, computer-based method for determining the important associations among variables in a domain, particularly when human expertise is not readily available. In this dissertation, I demonstrate that such computer-based algorithms are conceptually feasible, robust to noise, computationally efficient, theoretically sound, and that they generate models that can classify new cases accurately. I first describe two algorithms that take as input a database of cases and optional user-supplied prior knowledge, and that generate a probabilistic network--in particular, a belief network--as output. The database may have incomplete data, and may contain noise. The resulting belief network may be used to determine important associations among variables in a poorly understood domain, or may be used as a classifier for new cases that were not used in learning. After describing the algorithms, I present simple examples of how these programs generate a belief network from a database. I then present the results of evaluating these algorithms on databases from several domains, including gynecologic pathology, lymph-node pathology, DNA-sequence analysis, and poisonous-mushroom classification. In most cases, the belief networks classify new test cases with high accuracy. In addition to discussing empirical results, I present an overview of proofs that these algorithms are based on metrics that will, as the number of cases in the database increases without limit, always prefer those networks that more closely approximate the true underlying distribution of the data in the database; that is, these algorithms are asymptotically correct. I conclude with a discussion of this work's contributions, and with a list of open research problems.

Edward H. Herskovits | E. Herskovits

[1] Thomas G. Dietterich,et al. Learning to Predict Sequences , 1985 .

[2] S. Zabell. W. E. Johnson's "Sufficientness" Postulate , 1982 .

[3] Thomas G. Dietterich,et al. A Comparative Review of Selected Methods for Learning from Examples , 1983 .

[4] C. Villegas,et al. On the Representation of Ignorance , 1977 .

[5] Donald A. Waterman,et al. Pattern-Directed Inference Systems , 1981, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6] Judea Pearl,et al. Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[7] Frank Rosenblatt,et al. PRINCIPLES OF NEURODYNAMICS. PERCEPTRONS AND THE THEORY OF BRAIN MECHANISMS , 1963 .

[8] Gregory F. Cooper,et al. The Computational Complexity of Probabilistic Inference Using Bayesian Belief Networks , 1990, Artif. Intell..

[9] Douglas B. Lenat,et al. The Role of Heuristics in Learning by Discovery: Three Case Studies , 1983 .

[10] V. Cerný. Thermodynamical approach to the traveling salesman problem: An efficient simulation algorithm , 1985 .

[11] R L Blum,et al. Discovery, confirmation, and incorporation of causal relationships from a large time-oriented clinical data base: the RX project. , 1982, Computers and biomedical research, an international journal.