Nonparametric Bayesian methods for supervised and unsupervised learning

I introduce two nonparametric Bayesian methods for solving problems of supervised and unsupervised learning. The first method simultaneously learns causal networks and causal theories from data. For example, given synthetic co-occurrence data from a simple causal model for the medical domain, it can learn relationships like "having a flu causes coughing", while also learning that observable quantities can be usefully grouped into categories like diseases and symptoms, and that diseases tend to cause symptoms, not the other way around. The second method is an online algorithm for learning a prototype-based model for categorial concepts, and can be used to solve problems of multiclass classification with missing features. I apply it to problems of categorizing newsgroup posts and recognizing handwritten digits. These approaches were inspired by a striking capacity of human learning, which should also be a desideratum for any intelligent system: the ability to learn certain kinds of "simple" or "natural" structures very quickly, while still being able to learn arbitrary and arbitrarily complex structures given enough data. In each case, I show how nonparametric Bayesian modeling and inference based on stochastic simulation give us some of the tools we need to achieve this goal. Thesis Supervisor: Joshua B. Tenenbaum Title: Paul E. Newton Career Development Professor

[1]  David B. Dunson,et al.  Bayesian Data Analysis , 2010 .

[2]  Daphne Koller,et al.  Ordering-Based Search: A Simple and Effective Algorithm for Learning Bayesian Networks , 2005, UAI.

[3]  Thomas L. Griffiths,et al.  Discovering Latent Classes in Relational Data , 2004 .

[4]  Marek J. Druzdzel,et al.  Learning Bayesian network parameters from small data sets: application of Noisy-OR gates , 2001, Int. J. Approx. Reason..

[5]  J. Pitman Combinatorial Stochastic Processes , 2006 .

[6]  Thomas L. Griffiths,et al.  A more rational model of categorization , 2006 .

[7]  Paul Fearnhead,et al.  Particle filters for mixture models with an unknown number of components , 2004, Stat. Comput..

[8]  Eric Horvitz,et al.  Probabilistic Diagnosis Using a Reformulation of the INTERNIST-1/QMR Knowledge Base Part II , 2016 .

[9]  Gregory F. Cooper,et al.  A Bayesian Method for the Induction of Probabilistic Networks from Data , 1992 .

[10]  M. Escobar,et al.  Markov Chain Sampling Methods for Dirichlet Process Mixture Models , 2000 .

[11]  James R. Glass,et al.  Heterogeneous measurements and multiple classifiers for speech recognition , 1998, ICSLP.

[12]  Stanley Wasserman,et al.  Social Network Analysis: Methods and Applications , 1994, Structural analysis in the social sciences.

[13]  Michael I. Jordan,et al.  Hierarchical Dirichlet Processes , 2006 .

[14]  Lise Getoor,et al.  Learning Probabilistic Relational Models , 1999, IJCAI.

[15]  Matthew J. Beal,et al.  The variational Bayesian EM algorithm for incomplete data: with application to scoring graphical model structures , 2003 .

[16]  Yoram Singer,et al.  The Forgetron: A Kernel-Based Perceptron on a Fixed Budget , 2005, NIPS.

[17]  Paolo Giudici,et al.  Improving Markov Chain Monte Carlo Model Search for Data Mining , 2004, Machine Learning.

[18]  Tom M. Mitchell,et al.  Using unlabeled data to improve text classification , 2001 .

[19]  Carl E. Rasmussen,et al.  The Infinite Gaussian Mixture Model , 1999, NIPS.

[20]  Amos Tanay,et al.  MinReg: A Scalable Algorithm for Learning Parsimonious Regulatory Networks in Yeast and Mammals , 2006, J. Mach. Learn. Res..

[21]  Nir Friedman,et al.  Learning Module Networks , 2002, J. Mach. Learn. Res..