Nonparametric Bayesian Methods for Extracting Structure from Data

One desirable property of machine learning algorithms is the ability to balance the number of parameters in a model in accordance with the amount of available data. Incorporating nonparametric Bayesian priors into models is one approach of automatically adjusting model capacity to the amount of available data: with small datasets, models are less complex (require storing fewer parameters in memory), whereas with larger datasets, models are implicitly more complex (require storing more parameters in memory). Thus, nonparametric Bayesian priors satisfy frequentist intuitions about model complexity within a fully Bayesian framework. This thesis presents several novel machine learning models and applications that use nonparametric Bayesian priors. We introduce two novel models that use flat, Dirichlet process priors. The first is an infinite mixture of experts model, which builds a fully generative, joint density model of the input and output space. The second is a Bayesian biclustering model, which simultaneously organizes a data matrix into block-constant biclusters. The model capable of efficiently processing very large, sparse matrices, enabling cluster analysis on incomplete data matrices. We introduce binary matrix factorization, a novel matrix factorization model that, in contrast to classic factorization methods, such as singular value decomposition, decomposes a matrix using latent binary matrices. We describe two nonparametric Bayesian priors over tree structures. The first is an infinitely exchangeable generalization of the nested Chinese restaurant process [11] that generates data-vectors at a single node in the tree. The second is a novel, finitely exchangeable prior generates trees by first partitioning data indices into groups and then by randomly assigning groups to a tree. We present two applications of the tree priors: the first automatically learns probabilistic stick-figure models of motion-capture data that recover plausible structure and are robust to missing marker data. The second learns hierarchical allocation models based on the latent Dirichlet allocation topic model for document corpora, where nodes in a topic-tree are latent “super-topics”, and nodes in a document-tree are latent categories. The thesis concludes with a summary of contributions, a discussion of the models and their limitations, and a brief outline of potential future research directions.

[1]  A. Edwards,et al.  Estimation of the Branch Points of a Branching Diffusion Process , 1970 .

[2]  J. Hartigan Direct Clustering of a Data Matrix , 1972 .

[3]  D. Blackwell,et al.  Ferguson Distributions Via Polya Urn Schemes , 1973 .

[4]  T. Ferguson A Bayesian Analysis of Some Nonparametric Problems , 1973 .

[5]  T. Ferguson Prior Distributions on Spaces of Probability Measures , 1974 .

[6]  C. Antoniak Mixtures of Dirichlet Processes with Applications to Bayesian Nonparametric Problems , 1974 .

[7]  B. Silverman,et al.  Some Aspects of the Spline Smoothing Approach to Non‐Parametric Regression Curve Fitting , 1985 .

[8]  École d'été de probabilités de Saint-Flour,et al.  École d'été de probabilités de Saint-Flour XIII - 1983 , 1985 .

[9]  S. Wasserman,et al.  Stochastic a posteriori blockmodels: Construction and assessment , 1987 .

[10]  T. Schedl,et al.  fog-2, a germ-line-specific sex determination gene required for hermaphrodite spermatogenesis in Caenorhabditis elegans. , 1988, Genetics.

[11]  J. Sethuraman A CONSTRUCTIVE DEFINITION OF DIRICHLET PRIORS , 1991 .

[12]  Radford M. Neal Bayesian Mixture Modeling by Monte Carlo Simulation , 1991 .

[13]  John R. Anderson,et al.  The Adaptive Nature of Human Categorization. , 1991 .

[14]  W. Gilks,et al.  Adaptive Rejection Sampling for Gibbs Sampling , 1992 .

[15]  W. Sudderth,et al.  Polya Trees and Random Distributions , 1992 .

[16]  M. West,et al.  Hyperparameter estimation in Dirichlet process mixture models , 1992 .

[17]  Geoffrey E. Hinton,et al.  Autoencoders, Minimum Description Length and Helmholtz Free Energy , 1993, NIPS.

[18]  Michael I. Jordan,et al.  Supervised learning from incomplete data via an EM approach , 1993, NIPS.

[19]  David Aldous,et al.  Tree-based models for random distribution of mass , 1993 .

[20]  John R. Anderson,et al.  The Adaptive Character of Thought , 1990 .

[21]  P. T. Szymanski,et al.  Adaptive mixtures of local experts are source coding solutions , 1993, IEEE International Conference on Neural Networks.

[22]  Geoffrey E. Hinton,et al.  An Alternative Model for Mixtures of Experts , 1994, NIPS.

[23]  Zoubin Ghahramani,et al.  Factorial Learning and the EM Algorithm , 1994, NIPS.

[24]  M. Lavine More Aspects of Polya Tree Distributions for Statistical Modelling , 1992 .

[25]  M. Escobar Estimating Normal Means with a Dirichlet Process Prior , 1994 .

[26]  S. MacEachern Estimating normal means with a conjugate style dirichlet process prior , 1994 .

[27]  R. Zemel,et al.  Learning sparse multiple cause models , 2000, Proceedings 15th International Conference on Pattern Recognition. ICPR-2000.

[28]  M. Escobar,et al.  Bayesian Density Estimation and Inference Using Mixtures , 1995 .

[29]  Eric Saund,et al.  A Multiple Cause Mixture Model for Unsupervised Learning , 1995, Neural Computation.

[30]  S. MacEachern,et al.  A semiparametric Bayesian model for randomised block designs , 1996 .

[31]  Jun S. Liu Nonparametric hierarchical Bayes via sequential imputations , 1996 .

[32]  J. Pitman,et al.  The two-parameter Poisson-Dirichlet distribution derived from a stable subordinator , 1997 .

[33]  P. Green,et al.  On Bayesian Analysis of Mixtures with an Unknown Number of Components (with discussion) , 1997 .

[34]  S. MacEachern,et al.  Estimating mixture of dirichlet process models , 1998 .

[35]  Christopher M. Bishop,et al.  GTM: The Generative Topographic Mapping , 1998, Neural Computation.

[36]  G. Tomlinson Analysis of densities , 1998 .

[37]  Jun S. Liu,et al.  Sequential importance sampling for nonparametric Bayes models: The next generation , 1999 .

[38]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[39]  R. Tibshirani,et al.  Clustering methods for the analysis of DNA microarray data , 1999 .

[40]  Carl E. Rasmussen,et al.  The Infinite Gaussian Mixture Model , 1999, NIPS.

[41]  Purushottam W. Laud,et al.  Bayesian Nonparametric Inference for Random Distributions and Related Functions , 1999 .

[42]  Miguel Á. Carreira-Perpiñán One-to-many mappings, continuity constraints and latent variable models , 1999 .

[43]  Christopher K. I. Williams A MCMC Approach to Hierarchical Mixture Modelling , 1999, NIPS.

[44]  M. Escobar,et al.  Markov Chain Sampling Methods for Dirichlet Process Mixture Models , 2000 .

[45]  George M. Church,et al.  Biclustering of Expression Data , 2000, ISMB.

[46]  S. MacEachern Decision Theoretic Aspects of Dependent Nonparametric Processes , 2000 .

[47]  L. Lazzeroni Plaid models for gene expression data , 2000 .

[48]  H. Ishwaran,et al.  Markov chain Monte Carlo in approximate Dirichlet and beta two-parameter process hierarchical models , 2000 .

[49]  Miguel Á. Carreira-Perpiñán,et al.  Continuous latent variable models for dimensionality reduction and sequential data reconstruction , 2001 .

[50]  Yang Song,et al.  Learning probabilistic structure for human motion detection , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[51]  Alan E. Gelfand,et al.  SPATIAL NONPARAMETRIC BAYESIAN MODELS , 2001 .

[52]  Lancelot F. James,et al.  Gibbs Sampling Methods for Stick-Breaking Priors , 2001 .

[53]  Carl E. Rasmussen,et al.  Infinite Mixtures of Gaussian Process Experts , 2001, NIPS.

[54]  J. Bernardo The Concept of Exchangeability and its Applications , 2001 .

[55]  Carl E. Rasmussen,et al.  Factorial Hidden Markov Models , 1997 .

[56]  Trevor Hastie,et al.  Imputing Missing Data for Gene Expression Arrays , 2001 .

[57]  P. Green,et al.  Modelling Heterogeneity With and Without the Dirichlet Process , 2001 .

[58]  Russ B. Altman,et al.  Missing value estimation methods for DNA microarrays , 2001, Bioinform..

[59]  Mario Medvedovic,et al.  Bayesian infinite mixture model based clustering of gene expression profiles , 2002, Bioinform..

[60]  E. Otranto,et al.  A NONPARAMETRIC BAYESIAN APPROACH TO DETECT THE NUMBER OF REGIMES IN MARKOV SWITCHING MODELS , 2002 .

[61]  W. Johnson,et al.  Modeling Regression Error With a Mixture of Polya Trees , 2002 .

[62]  Peter D. Hoff,et al.  Identifying Carriers of a Genetic Modifier Using Nonparametric Bayesian Methods , 2002 .

[63]  Thomas L. Griffiths,et al.  Hierarchical Topic Models and the Nested Chinese Restaurant Process , 2003, NIPS.

[64]  Peter Müller,et al.  ANOVA DDP Models: A Review , 2003 .

[65]  Shin Ishii,et al.  A Bayesian missing value estimation method for gene expression profile data , 2003, Bioinform..

[66]  Radford M. Neal,et al.  Density Modeling and Clustering Using Dirichlet Diffusion Trees , 2003 .

[67]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[68]  Thomas L. Griffiths,et al.  Integrating Topics and Syntax , 2004, NIPS.

[69]  T. H. Bø,et al.  LSimpute: accurate estimation of missing values in microarray data with least squares methods. , 2004, Nucleic acids research.

[70]  S. MacEachern,et al.  An ANOVA Model for Dependent Random Measures , 2004 .

[71]  F. Dellaert,et al.  Dirichlet Process based Bayesian Partition Models for Robot Topological Mapping , 2004 .

[72]  Michael I. Jordan,et al.  Variational methods for the Dirichlet process , 2004, ICML.

[73]  P. Müller,et al.  A method for combining inference across related nonparametric Bayesian models , 2004 .

[74]  Radford M. Neal,et al.  A Split-Merge Markov chain Monte Carlo Procedure for the Dirichlet Process Mixture Model , 2004 .

[75]  Paul Fearnhead,et al.  Particle filters for mixture models with an unknown number of components , 2004, Stat. Comput..

[76]  P. Földiák,et al.  Forming sparse representations by local anti-Hebbian learning , 1990, Biological Cybernetics.

[77]  John R. Anderson,et al.  Explorations of an Incremental, Bayesian Algorithm for Categorization , 1992, Machine Learning.

[78]  Mark Steyvers,et al.  Finding scientific topics , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[79]  P. Müller,et al.  A Bayesian mixture model for differential gene expression , 2005 .

[80]  Thomas L. Griffiths,et al.  Interpolating between types and tokens by estimating power-law generators , 2005, NIPS.

[81]  David A. Forsyth,et al.  Skeletal parameter estimation from optical motion capture data , 2004, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[82]  Thomas L. Griffiths,et al.  Infinite latent feature models and the Indian buffet process , 2005, NIPS.

[83]  Simon Osindero,et al.  An Alternative Infinite Mixture Of Gaussian Process Experts , 2005, NIPS.

[84]  S. MacEachern,et al.  Bayesian Nonparametric Spatial Modeling With Dirichlet Process Mixing , 2005 .

[85]  S. Roweis,et al.  Time-Varying Topic Models using Dependent Dirichlet Processes , 2005 .

[86]  J. E. Griffin,et al.  Order-Based Dependent Dirichlet Processes , 2006 .

[87]  Max Welling,et al.  Gibbs Sampling for (Coupled) Infinite Mixture Models in the Stick Breaking Representation , 2006, UAI.

[88]  J. Tenenbaum,et al.  Theory-based Bayesian models of inductive learning and reasoning , 2006, Trends in Cognitive Sciences.

[89]  Marc Pollefeys,et al.  Automatic Kinematic Chain Building from Feature Trajectories of Articulated Objects , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[90]  Michael I. Jordan,et al.  Nonparametric empirical Bayes for the Dirichlet process mixture model , 2006, Stat. Comput..

[91]  Eric P. Xing,et al.  Hidden Markov Dirichlet Process: Modeling Genetic Recombination in Open Ancestral Space , 2006, NIPS.

[92]  Michael A. West,et al.  Hierarchical priors and mixture models, with applications in regression and density estimation , 2006 .

[93]  Thomas L. Griffiths,et al.  Learning Systems of Concepts with an Infinite Relational Model , 2006, AAAI.

[94]  J. Pella,et al.  The Gibbs and splitmerge sampler for population mixture analysis from genetic data with incomplete baselines , 2006 .

[95]  J. Pitman Combinatorial Stochastic Processes , 2006 .

[96]  John D. Lafferty,et al.  Dynamic topic models , 2006, ICML.

[97]  Michael I. Jordan,et al.  Hierarchical Dirichlet Processes , 2006 .

[98]  Wei Li,et al.  Pachinko allocation: DAG-structured mixture models of topic correlations , 2006, ICML.

[99]  Brendan J. Frey,et al.  Matrix Tile Analysis , 2006, UAI.

[100]  Yee Whye Teh,et al.  A Bayesian Interpretation of Interpolated Kneser-Ney , 2006 .

[101]  Thomas L. Griffiths,et al.  A Non-Parametric Bayesian Method for Inferring Hidden Causes , 2006, UAI.

[102]  Wei Li,et al.  Mixtures of hierarchical topics with Pachinko allocation , 2007, ICML '07.

[103]  Adam N. Sanborn,et al.  Unifying rational models of categorization via the hierarchical Dirichlet process , 2019 .

[104]  G. Roberts,et al.  Retrospective Markov chain Monte Carlo methods for Dirichlet process hierarchical models , 2007, 0710.4228.

[105]  S. Roweis,et al.  Nonparametric Bayesian Biclustering , 2007 .

[106]  Christopher Joseph Pal,et al.  Analyzing in situ gene expression in the mouse brain with image registration, feature extraction and block clustering , 2007, BMC Bioinformatics.

[107]  Yee Whye Teh,et al.  Bayesian Agglomerative Clustering with Coalescents , 2007, NIPS.

[108]  Thomas Hofmann,et al.  A Nonparametric Bayesian Method for Inferring Features From Similarity Judgments , 2007 .

[109]  P. Eric,et al.  A Nonparametric Bayesian Approach for Haplotype Reconstruction from Single and Multi-Population Data , 2007 .

[110]  Roded Sharan,et al.  Bayesian haplo-type inference via the dirichlet process , 2004, ICML.

[111]  B. Schölkopf,et al.  Modeling Dyadic Data with Binary Latent Factors , 2007 .

[112]  Michael I. Jordan,et al.  Hierarchical Beta Processes and the Indian Buffet Process , 2007, AISTATS.

[113]  Yee Whye Teh,et al.  Collapsed Variational Dirichlet Process Mixture Models , 2007, IJCAI.

[114]  Wei Li,et al.  Nonparametric Bayes Pachinko Allocation , 2007, UAI.

[115]  Michael I. Jordan,et al.  Learning Multiscale Representations of Natural Scenes Using Dirichlet Processes , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[116]  Yee Whye Teh,et al.  Stick-breaking Construction for the Indian Buffet Process , 2007, AISTATS.

[117]  Dan Klein,et al.  The Infinite PCFG Using Hierarchical Dirichlet Processes , 2007, EMNLP.

[118]  Radford M. Neal,et al.  Splitting and merging components of a nonconjugate Dirichlet process mixture model , 2007 .

[119]  John D. Lafferty,et al.  A correlated topic model of Science , 2007, 0708.3601.

[120]  A. R. Ferreira da Silva A Dirichlet process mixture model for brain MRI tissue classification. , 2007, Medical image analysis.

[121]  Roland Memisevic,et al.  Non-linear Latent Factor Models for Revealing Structure in High-dimensional Data , 2008 .

[122]  Richard S. Zemel,et al.  Learning stick-figure models using nonparametric Bayesian priors over trees , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[123]  Jason A. Duan,et al.  Modeling Disease Incidence Data with Spatial and Spatio Temporal Dirichlet Process Mixtures , 2008, Biometrical journal. Biometrische Zeitschrift.

[124]  Babak Shahbaba,et al.  Nonlinear Models Using Dirichlet Process Mixtures , 2007, J. Mach. Learn. Res..

[125]  Runze Li,et al.  Mixture of Gaussian Processes and its Applications , 2010 .