Nonparametric Bayesian discrete latent variable models for unsupervised learning

The analysis of real-world problems often requires robust and flexible models that can accurately represent the structure in the data. Nonparametric Bayesian priors allow the construction of such models which can be used for complex real-world data. Nonparametric models, despite their name, can be defined as models that have infinitely many parameters. This thesis is about two types of nonparametric models. The first type is the latent class models (i.e. a mixture model) with infinitely many classes, which we construct using Dirichlet process mixtures (DPM). The second is the discrete latent feature models with infinitely many features, for which we use the Indian buffet process (IBP), a generalization of the DPM. Analytical inference is not possible in the models discussed in this thesis. The use of conjugate priors can often make inference somewhat more tractable, but for a given model the family of conjugate priors may not always be rich enough. Methodologically this thesis will rely on Markov chain Monte Carlo (MCMC) techniques for inference, especially those which can be used in the absence of conjugacy. Chapter 2 introduces the basic terminology and notation used in the thesis. Chapter 3 presents the Dirichlet process (DP) and some infinite latent class models which use the DP as a prior. We first summarize different approaches for defining the DP, and describe several established MCMC algorithms for inference on the DPM models. The Dirichlet process mixtures of Gaussians (DPMoG) model has been extensively used for density estimation. We present an empirical comparison of conjugate and conditionally conjugate priors in the DPMoG, demonstrating that the latter can give better density estimates without significant additional computational cost. The mixtures of factor analyzers (MFA) model allows data to be modeled as a mixture of Gaussians with a reduced parametrization. We present the formulation of a nonparametric form of the MFA model, the Dirichlet process MFA (DPMFA). We utilize the DPMFA for clustering the action potentials of different neurons from extracellular recordings, a problem known as spike sorting. Chapter 4 presents the IBP and some infinite latent feature models which use the IBP as a prior. The IBP is a distribution over binary matrices with infinitely many columns. We describe different approaches for defining the distribution and present new MCMC techniques that can be used for inference on models which use it as a prior. Empirical results on a conjugate model are presented showing that the new methods perform as well as the established method of Gibbs sampling, but without the requirement for conjugacy. We demonstrate the performance of a non-conjugate IBP model by successfully learning the latent features of handwritten digits. Finally, we formulate a nonparametric version of the elimination-by-aspects (EBA) choice model using the IBP, and show that it can make accurate predictions about the people’s choice outcomes in a paired comparison task.

[1]  J. Sethuraman,et al.  Convergence of Dirichlet Measures and the Interpretation of Their Parameter. , 1981 .

[2]  Carl E. Rasmussen,et al.  Factorial Hidden Markov Models , 1997 .

[3]  S. Walker,et al.  On rates of convergence for posterior distributions in infinite-dimensional models , 2007, 0708.1892.

[4]  D. W. Scott,et al.  Multivariate Density Estimation, Theory, Practice and Visualization , 1992 .

[5]  Michael I. Jordan,et al.  Hierarchical Beta Processes and the Indian Buffet Process , 2007, AISTATS.

[6]  Fernando A. Quintana,et al.  Nonparametric Bayesian data analysis , 2004 .

[7]  Yee Whye Teh,et al.  Collapsed Variational Dirichlet Process Mixture Models , 2007, IJCAI.

[8]  David B. Dunson,et al.  Bayesian Data Analysis , 2010 .

[9]  A. Tversky Elimination by aspects: A theory of choice. , 1972 .

[10]  Paul Fearnhead,et al.  Particle filters for mixture models with an unknown number of components , 2004, Stat. Comput..

[11]  Carl E. Rasmussen,et al.  The Infinite Gaussian Mixture Model , 1999, NIPS.

[12]  Hemant Ishwaran,et al.  Computational Methods for Multiplicative Intensity Models Using Weighted Gamma Processes , 2004 .

[13]  A. Utsugi,et al.  Bayesian Analysis of Mixtures of Factor Analyzers , 2001, Neural Computation.

[14]  Yiming Yang,et al.  A Probabilistic Model for Online Document Clustering with Application to Novelty Detection , 2004, NIPS.

[15]  S. MacEachern,et al.  A semiparametric Bayesian model for randomised block designs , 1996 .

[16]  R. Duncan Luce,et al.  Individual Choice Behavior , 1959 .

[17]  Carl E. Rasmussen,et al.  Infinite Mixtures of Gaussian Process Experts , 2001, NIPS.

[18]  Lancelot F. James,et al.  Gibbs Sampling Methods for Stick-Breaking Priors , 2001 .

[19]  T. Ferguson A Bayesian Analysis of Some Nonparametric Problems , 1973 .

[20]  Max Welling,et al.  Accelerated Variational Dirichlet Process Mixtures , 2006, NIPS.

[21]  Geoffrey E. Hinton,et al.  Bayesian Learning for Neural Networks , 1995 .

[22]  E. Evarts A technique for recording activity of subcortical neurons in moving animals. , 1968, Electroencephalography and clinical neurophysiology.

[23]  D. Freedman On the Asymptotic Behavior of Bayes' Estimates in the Discrete Case , 1963 .

[24]  Matthew J. Beal,et al.  Gene Expression Time Course Clustering with Countably Infinite Hidden Markov Models , 2006, UAI.

[25]  Lancelot F. James,et al.  Approximate Dirichlet Process Computing in Finite Normal Mixtures , 2002 .

[26]  Michael A. West,et al.  Hierarchical priors and mixture models, with applications in regression and density estimation , 2006 .

[27]  Angela M. Dean,et al.  The Dependent Poisson Race Model and Modeling Dependence in Conjoint Choice Experiments , 2008 .

[28]  M. Stephens Bayesian analysis of mixture models with an unknown number of components- an alternative to reversible jump methods , 2000 .

[29]  Max Welling,et al.  Gibbs Sampling for (Coupled) Infinite Mixture Models in the Stick Breaking Representation , 2006, UAI.

[30]  W. Gilks,et al.  Adaptive Rejection Sampling for Gibbs Sampling , 1992 .

[31]  Steven N. MacEachern,et al.  Efficient MCMC Schemes for Robust Model Extensions Using Encompassing Dirichlet Process Mixture Models , 2000 .

[32]  Carl E. Rasmussen,et al.  Modelling Spikes with Mixtures of Factor Analysers , 2004, DAGM-Symposium.

[33]  A. Raftery,et al.  A note on the Dirichlet process prior in Bayesian nonparametric inference with partial exchangeability , 1997 .

[34]  M. Escobar,et al.  Bayesian Density Estimation and Inference Using Mixtures , 1995 .

[35]  R. A. Bradley,et al.  RANK ANALYSIS OF INCOMPLETE BLOCK DESIGNS THE METHOD OF PAIRED COMPARISONS , 1952 .

[36]  M. Newton,et al.  A recursive algorithm for nonparametric analysis with missing data , 1999 .

[37]  Michael J. Black,et al.  A Non-Parametric Bayesian Approach to Spike Sorting , 2006, 2006 International Conference of the IEEE Engineering in Medicine and Biology Society.

[38]  Aarnout Brombacher,et al.  Probability... , 2009, Qual. Reliab. Eng. Int..

[39]  O. Cappé,et al.  Reversible jump, birth‐and‐death and more general continuous time Markov chain Monte Carlo samplers , 2003 .

[40]  L. Wasserman,et al.  Rates of convergence of posterior distributions , 2001 .

[41]  M S Lewicki,et al.  A review of methods for spike sorting: the detection and classification of neural action potentials. , 1998, Network.

[42]  Thomas Hofmann,et al.  A Nonparametric Bayesian Method for Inferring Features From Similarity Judgments , 2007 .

[43]  A. Gelfand,et al.  Bayesian Semiparametric Median Regression Modeling , 2001 .

[44]  Geoffrey E. Hinton,et al.  The EM algorithm for mixtures of factor analyzers , 1996 .

[45]  Michael I. Jordan,et al.  Nonparametric empirical Bayes for the Dirichlet process mixture model , 2006, Stat. Comput..

[46]  Yee Whye Teh,et al.  Stick-breaking Construction for the Indian Buffet Process , 2007, AISTATS.

[47]  T. Ferguson,et al.  Bayesian nonparametric inference , 1992 .

[48]  Eric R. Ziegel,et al.  Practical Nonparametric and Semiparametric Bayesian Statistics , 1998, Technometrics.

[49]  Radford M. Neal,et al.  A Split-Merge Markov chain Monte Carlo Procedure for the Dirichlet Process Mixture Model , 2004 .

[50]  Bruce L. McNaughton,et al.  The stereotrode: A new technique for simultaneous isolation of several single units in the central nervous system from multiple unit records , 1983, Journal of Neuroscience Methods.

[51]  B. Schölkopf,et al.  Modeling Dyadic Data with Binary Latent Factors , 2007 .

[52]  Christian Schmid,et al.  A Matlab function to estimate choice model parameters from paired-comparison data , 2004, Behavior research methods, instruments, & computers : a journal of the Psychonomic Society, Inc.

[53]  Paul Damien,et al.  Sampling Methods For Bayesian Nonparametric Inference Involving Stochastic Processes , 1998 .

[54]  John Skilling,et al.  Prior Distributions on Measure Space , 1997 .

[55]  Glen Takahara,et al.  Independent and Identically Distributed Monte Carlo Algorithms for Semiparametric Linear Mixed Models , 2002 .

[56]  T. Ferguson Some Developments of the Blackwell-MacQueen Urn Scheme , 2005 .

[57]  A. Shiryaev,et al.  Probability (2nd ed.) , 1995, Technometrics.

[58]  David J. C. MacKay,et al.  BAYESIAN NON-LINEAR MODELING FOR THE PREDICTION COMPETITION , 1996 .

[59]  S. MacEachern Estimating normal means with a conjugate style dirichlet process prior , 1994 .

[60]  Radford M. Neal,et al.  Splitting and merging components of a nonconjugate Dirichlet process mixture model , 2007 .

[61]  J. Pitman,et al.  The two-parameter Poisson-Dirichlet distribution derived from a stable subordinator , 1997 .

[62]  Lawrence Carin,et al.  Multi-Task Learning for Classification with Dirichlet Process Priors , 2007, J. Mach. Learn. Res..

[63]  R. Fisher THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[64]  H. Ishwaran,et al.  Markov chain Monte Carlo in approximate Dirichlet and beta two-parameter process hierarchical models , 2000 .

[65]  D. M. Titterington,et al.  Mixtures of Factor Analysers. Bayesian Estimation and Inference by Stochastic Simulation , 2004, Machine Learning.

[66]  École d'été de probabilités de Saint-Flour,et al.  École d'été de probabilités de Saint-Flour XIII - 1983 , 1985 .

[67]  Purushottam W. Laud,et al.  Bayesian Nonparametric Inference for Random Distributions and Related Functions , 1999 .

[68]  Thomas L. Griffiths,et al.  A Nonparametric Bayesian Method for Inferring Features From Similarity Judgments , 2006, NIPS.

[69]  Thomas L. Griffiths,et al.  A Non-Parametric Bayesian Method for Inferring Hidden Causes , 2006, UAI.

[70]  David A. Freedman,et al.  Rejoinder: On the Consistency of Bayes Estimates , 1986 .

[71]  L. Tardella,et al.  Approximating distributions of random functionals of Ferguson‐Dirichlet priors , 1998 .

[72]  Jun S. Liu Nonparametric hierarchical Bayes via sequential imputations , 1996 .

[73]  A. V. D. Vaart,et al.  Convergence rates of posterior distributions , 2000 .

[74]  Antonio Torralba,et al.  Describing Visual Scenes using Transformed Dirichlet Processes , 2005, NIPS.

[75]  D. Blackwell,et al.  Ferguson Distributions Via Polya Urn Schemes , 1973 .

[76]  P. Green,et al.  Modelling Heterogeneity With and Without the Dirichlet Process , 2001 .

[77]  Michael I. Jordan,et al.  Hierarchical Dirichlet Processes , 2006 .

[78]  Roded Sharan,et al.  Bayesian haplo-type inference via the dirichlet process , 2004, ICML.

[79]  Daniel Marcu,et al.  A Bayesian Model for Supervised Clustering with the Dirichlet Process Prior , 2005, J. Mach. Learn. Res..

[80]  R. R. Bush,et al.  Psychology of Judgment and Choice: A Theoretical Essay. , 1962 .

[81]  J. Kingman Random Discrete Distributions , 1975 .

[82]  C. Antoniak Mixtures of Dirichlet Processes with Applications to Bayesian Nonparametric Problems , 1974 .

[83]  Wei Chu,et al.  Identifying Protein Complexes in High-Throughput Protein Interaction Screens Using an Infinite Latent Feature Model , 2005, Pacific Symposium on Biocomputing.

[84]  Radford M. Neal Slice Sampling , 2003, The Annals of Statistics.

[85]  J. Ghosh,et al.  POSTERIOR CONSISTENCY OF DIRICHLET MIXTURES IN DENSITY ESTIMATION , 1999 .

[86]  Zoubin Ghahramani,et al.  Variational Inference for Bayesian Mixtures of Factor Analysers , 1999, NIPS.

[87]  Frank Jäkel,et al.  Bayesian inference for psychometric functions. , 2005, Journal of vision.

[88]  Carl E. Rasmussen,et al.  Clustering Protein Sequence and Structure Space with Infinite Gaussian Mixture Models , 2003, Pacific Symposium on Biocomputing.

[89]  H. A. David,et al.  The method of paired comparisons , 1966 .

[90]  N. Hjort Nonparametric Bayes Estimators Based on Beta Processes in Models for Life History Data , 1990 .

[91]  T. Griffiths,et al.  Modeling individual differences using Dirichlet processes , 2006 .

[92]  B. Schölkopf,et al.  Edinburgh Research Explorer Interpolating between types and tokens by estimating power-law generators , 2006 .

[93]  Michael A. West,et al.  BAYESIAN MODEL ASSESSMENT IN FACTOR ANALYSIS , 2004 .

[94]  G. Roberts,et al.  Retrospective Markov chain Monte Carlo methods for Dirichlet process hierarchical models , 2007, 0710.4228.

[95]  E. Xing,et al.  A Hidden Markov Dirichlet Process Model for Genetic Recombination in Open Ancestral Space , 2007 .

[96]  T. Ferguson Prior Distributions on Spaces of Probability Measures , 1974 .

[97]  R. A. Bradley,et al.  RANK ANALYSIS OF INCOMPLETE BLOCK DESIGNS , 1952 .

[98]  Thomas L. Griffiths,et al.  Infinite latent feature models and the Indian buffet process , 2005, NIPS.

[99]  Robert L. Wolpert,et al.  Simulation of Lévy Random Fields , 1998 .

[100]  P. Green Reversible jump Markov chain Monte Carlo computation and Bayesian model determination , 1995 .

[101]  S. MacEachern,et al.  Estimating mixture of dirichlet process models , 1998 .

[102]  Stephen G. Walker,et al.  Sampling the Dirichlet Mixture Model with Slices , 2006, Commun. Stat. Simul. Comput..

[103]  D. Aldous Exchangeability and related topics , 1985 .

[104]  Yee Whye Teh,et al.  Bayesian multi-population haplotype inference via a hierarchical dirichlet process mixture , 2006, ICML.

[105]  Andrew Daly,et al.  On the Equivalence Between Elimination-By-Aspects and Generalised Extreme Value Models of Choice Behaviour: , 2006 .

[106]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[107]  L. Frank,et al.  An application of reversible-jump Markov chain Monte Carlo to spike classification of multi-unit extracellular recordings. , 2003, Network.

[108]  Jun S. Liu,et al.  Sequential importance sampling for nonparametric Bayes models: The next generation , 1999 .

[109]  Michael I. Jordan,et al.  Variational methods for the Dirichlet process , 2004, ICML.

[110]  Donald L. Rumelhart,et al.  Similarity between stimuli: An experimental test of the Luce and Restle choice models. , 1971 .

[111]  P. Green,et al.  On Bayesian Analysis of Mixtures with an Unknown Number of Components (with discussion) , 1997 .

[112]  Simon Osindero,et al.  An Alternative Infinite Mixture Of Gaussian Process Experts , 2005, NIPS.

[113]  J. Sethuraman A CONSTRUCTIVE DEFINITION OF DIRICHLET PRIORS , 1991 .

[114]  Radford M. Neal Bayesian Mixture Modeling , 1992 .

[115]  M. Escobar Estimating Normal Means with a Dirichlet Process Prior , 1994 .

[116]  M. Escobar,et al.  Markov Chain Sampling Methods for Dirichlet Process Mixture Models , 2000 .

[117]  K. Doksum Tailfree and Neutral Random Probabilities and Their Posterior Distributions , 1974 .

[118]  G. C. Tiao,et al.  Bayesian inference in statistical analysis , 1973 .

[119]  Thomas L. Griffiths,et al.  Hierarchical Topic Models and the Nested Chinese Restaurant Process , 2003, NIPS.

[120]  Christopher M. Miller,et al.  Useful mathematical relationships embedded in Tversky's elimination by aspects model , 2003 .

[121]  J. Pitman Combinatorial Stochastic Processes , 2006 .

[122]  F. Quintana Nonparametric Bayesian Analysis for Assessing Homogeneity in k × l Contingency Tables with Fixed Right Margin Totals , 1998 .

[123]  Carl E. Rasmussen,et al.  A choice model with infinitely many latent features , 2006, ICML.

[124]  T. Ferguson,et al.  A Representation of Independent Increment Processes without Gaussian Components , 1972 .