The Indian Buffet Process: Scalable Inference and Extensions

Many unsupervised learning problems seek to identify hidden features from observations. In many real-world situations, the number of hidden features is unknown. To avoid specifying the number of hidden features a priori, one can use the Indian Buffet Process (IBP): a nonparametric latent feature model that does not bound the number of active features in a dataset. While elegant, the lack of efficient inference procedures for the IBP has prevented its application in large-scale problems. The core contribution of this thesis are three new inference procedures that allow inference in the IBP to be scaled from a few hundred to 100,000 observations. This thesis contains three parts: (1) An introduction to the IBP and a review of inference techniques and extensions. The first chapters summarise three constructions for the IBP and review all currently published inference techniques. Appendix C reviews extensions of the IBP to date. (2) Novel techniques for scalable Bayesian inference. This thesis presents three new inference procedures: (a) an accelerated Gibbs sampler for efficient Bayesian inference in a broad class of conjugate models, (b) a parallel, asynchronous Gibbs sampler that allows the accelerated Gibbs sampler to be distributed across multiple processors, and (c) a variational inference procedure for the IBP. (3)A framework for structured nonparametric latent feature models. We also present extensions to the IBP to model more sophisticated relationships between the co-occurring hidden features, providing a general framework for correlated non-parametric feature models.

[1]  Touradj Ebrahimi,et al.  An efficient P300-based brain–computer interface for disabled subjects , 2008, Journal of Neuroscience Methods.

[2]  Thomas L. Griffiths,et al.  The Phylogenetic Indian Buffet Process: A Non-Exchangeable Nonparametric Prior for Latent Features , 2008, UAI.

[3]  Ramesh Nallapati,et al.  Parallelized Variational EM for Latent Dirichlet Allocation: An Experimental Evaluation of Speed and Scalability , 2007, Seventh IEEE International Conference on Data Mining Workshops (ICDMW 2007).

[4]  John D. Lafferty,et al.  Correlated Topic Models , 2005, NIPS.

[5]  D. Applebaum Lévy Processes and Stochastic Calculus: Preface , 2009 .

[6]  Naonori Ueda,et al.  Parametric Mixture Models for Multi-Labeled Text , 2002, NIPS.

[7]  Michael I. Jordan,et al.  Graphical Models, Exponential Families, and Variational Inference , 2008, Found. Trends Mach. Learn..

[8]  Radford M. Neal Connectionist Learning of Belief Networks , 1992, Artif. Intell..

[9]  David J. Kriegman,et al.  From Few to Many: Illumination Cone Models for Face Recognition under Variable Lighting and Pose , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[10]  Radford M. Neal,et al.  A Split-Merge Markov chain Monte Carlo Procedure for the Dirichlet Process Mixture Model , 2004 .

[11]  Joseph Gonzalez,et al.  Residual Splash for Optimally Parallelizing Belief Propagation , 2009, AISTATS.

[12]  Wei Chu,et al.  Identifying Protein Complexes in High-Throughput Protein Interaction Screens Using an Infinite Latent Feature Model , 2005, Pacific Symposium on Biocomputing.

[13]  Gareth O. Roberts,et al.  Convergence assessment techniques for Markov chain Monte Carlo , 1998, Stat. Comput..

[14]  Lawrence Carin,et al.  Nonparametric factor analysis with beta process priors , 2009, ICML '09.

[15]  Michael I. Jordan,et al.  Variational methods for the Dirichlet process , 2004, ICML.

[16]  M. West,et al.  High-Dimensional Sparse Factor Modeling: Applications in Gene Expression Genomics , 2008, Journal of the American Statistical Association.

[17]  Michael I. Jordan,et al.  Hierarchical Dirichlet Processes , 2006 .

[18]  Zoubin Ghahramani,et al.  Correlated Non-Parametric Latent Feature Models , 2009, UAI.

[19]  Thore Graepel,et al.  WWW 2009 MADRID! Track: Data Mining / Session: Statistical Methods Matchbox: Large Scale Online Bayesian Recommendations , 2022 .

[20]  Yee Whye Teh,et al.  Stick-breaking Construction for the Indian Buffet Process , 2007, AISTATS.

[21]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[22]  J. Church Human Development Report , 2001 .

[23]  Michael W Deem,et al.  Parallel tempering: theory, applications, and new perspectives. , 2005, Physical chemistry chemical physics : PCCP.

[24]  Michael I. Jordan,et al.  Advances in Neural Information Processing Systems 30 , 1995 .

[25]  Avinash C. Kak,et al.  PCA versus LDA , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[26]  Yee Whye Teh,et al.  Variational Inference for the Indian Buffet Process , 2009, AISTATS.

[27]  Hal Daumé,et al.  The Infinite Hierarchical Factor Regression Model , 2008, NIPS.

[28]  A. Tversky Elimination by aspects: A theory of choice. , 1972 .

[29]  Thomas L. Griffiths,et al.  A Non-Parametric Bayesian Method for Inferring Hidden Causes , 2006, UAI.

[30]  Xiaojin Zhu,et al.  Statistical Debugging Using Latent Topic Models , 2007, ECML.

[31]  Lancelot F. James,et al.  Gibbs Sampling Methods for Stick-Breaking Priors , 2001 .

[32]  T. Ferguson A Bayesian Analysis of Some Nonparametric Problems , 1973 .

[33]  Kunle Olukotun,et al.  Map-Reduce for Machine Learning on Multicore , 2006, NIPS.

[34]  Matthew J. Beal Variational algorithms for approximate Bayesian inference , 2003 .

[35]  Geoffrey E. Hinton,et al.  Developing Population Codes by Minimizing Description Length , 1993, NIPS.

[36]  Daniel P. W. Ellis,et al.  A Discriminative Model for Polyphonic Piano Transcription , 2007, EURASIP J. Adv. Signal Process..

[37]  Thomas L. Griffiths,et al.  Infinite latent feature models and the Indian buffet process , 2005, NIPS.

[38]  John K Kruschke,et al.  Bayesian data analysis. , 2010, Wiley interdisciplinary reviews. Cognitive science.

[39]  Hoon Kim,et al.  Monte Carlo Statistical Methods , 2000, Technometrics.

[40]  Heng Tao Shen,et al.  Principal Component Analysis , 2009, Encyclopedia of Biometrics.

[41]  G. Kane Parallel Distributed Processing: Explorations in the Microstructure of Cognition, vol 1: Foundations, vol 2: Psychological and Biological Models , 1994 .

[42]  Yee Whye Teh,et al.  The Infinite Factorial Hidden Markov Model , 2008, NIPS.

[43]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems , 1988 .

[44]  Charles M. Bishop,et al.  Variational Message Passing , 2005, J. Mach. Learn. Res..

[45]  Zoubin Ghahramani,et al.  Modeling Dyadic Data with Binary Latent Factors , 2006, NIPS.

[46]  Zoubin Ghahramani,et al.  Infinite Sparse Factor Analysis and Infinite Independent Components Analysis , 2007, ICA.

[47]  Max Welling,et al.  Distributed Inference for Latent Dirichlet Allocation , 2007, NIPS.

[48]  T. Griffiths,et al.  Bayesian nonparametric latent feature models , 2007 .

[49]  Wm. R. Wright General Intelligence, Objectively Determined and Measured. , 1905 .

[50]  Carl E. Rasmussen,et al.  A choice model with infinitely many latent features , 2006, ICML.

[51]  Radford M. Neal Probabilistic Inference Using Markov Chain Monte Carlo Methods , 2011 .

[52]  Wei Li,et al.  Nonparametric Bayes Pachinko Allocation , 2007, UAI.

[53]  Roger N. Shepard,et al.  Additive clustering: Representation of similarities as combinations of discrete overlapping properties. , 1979 .

[54]  P. Green Reversible jump Markov chain Monte Carlo computation and Bayesian model determination , 1995 .

[55]  Max Welling,et al.  Asynchronous Distributed Learning of Topic Models , 2008, NIPS.

[56]  Kenneth Y. Goldberg,et al.  Eigentaste: A Constant Time Collaborative Filtering Algorithm , 2001, Information Retrieval.

[57]  Joshua B. Tenenbaum,et al.  The Infinite Latent Events Model , 2009, UAI.

[58]  Radford M. Neal Markov Chain Sampling Methods for Dirichlet Process Mixture Models , 2000 .

[59]  Yoshua. Bengio,et al.  Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..

[60]  Zoubin Ghahramani,et al.  Accelerated sampling for the Indian Buffet Process , 2009, ICML '09.

[61]  Max Welling,et al.  Bayesian k-Means as a Maximization-Expectation Algorithm , 2009, Neural Computation.

[62]  Michael I. Jordan,et al.  Hierarchical Beta Processes and the Indian Buffet Process , 2007, AISTATS.

[63]  Michalis K. Titsias,et al.  The Infinite Gamma-Poisson Feature Model , 2007, NIPS.

[64]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[65]  Thomas L. Griffiths,et al.  Particle Filtering for Nonparametric Bayesian Matrix Factorization , 2006, NIPS.