Bayesian Nonparametric Latent Feature Models

Priors for Bayesian nonparametric latent feature models were originally developed a little over five years ago, sparking interest in a new type of Bayesian nonparametric model. Since then, there have been three main areas of research for people interested in these priors: extensions/generalizations of the priors, inference algorithms, and applications. This dissertation summarizes our work advancing the state of the art in all three of these areas. In the first area, we present a non-exchangeable framework for generalizing and extending the original priors, allowing more prior knowledge to be used in nonparametric priors. Within this framework, we introduce four concrete generalizations that are applicable when we have prior knowledge about object relationships that can be captured either via a tree or chain. We discuss how to develop and derive these priors as well as how to perform posterior inference in models using them. In the area of inference algorithms, we present the first variational approximation for one class of these priors, demonstrating in what regimes they might be preferred over more traditional MCMC approaches. Finally, we present an application of basic nonparametric latent features models to link prediction as well as applications of our non-exchangeable priors to tree-structured choice models and human genomic data.

[1]  William Feller,et al.  An Introduction to Probability Theory and Its Applications , 1951 .

[2]  R. Duncan Luce,et al.  Individual Choice Behavior , 1959 .

[3]  J. Kingman,et al.  Completely random measures. , 1967 .

[4]  Donald L. Rumelhart,et al.  Similarity between stimuli: An experimental test of the Luce and Restle choice models. , 1971 .

[5]  A. Tversky Elimination by aspects: A theory of choice. , 1972 .

[6]  W. Ewens The sampling theory of selectively neutral alleles. , 1972, Theoretical population biology.

[7]  D. Blackwell,et al.  Ferguson Distributions Via Polya Urn Schemes , 1973 .

[8]  T. Ferguson A Bayesian Analysis of Some Nonparametric Problems , 1973 .

[9]  T. Ferguson Prior Distributions on Spaces of Probability Measures , 1974 .

[10]  C. Antoniak Mixtures of Dirichlet Processes with Applications to Bayesian Nonparametric Problems , 1974 .

[11]  Roger N. Shepard,et al.  Additive clustering: Representation of similarities as combinations of discrete overlapping properties. , 1979 .

[12]  A. Tversky,et al.  "Preference trees": Correction to Tversky and Sattath , 1980 .

[13]  A. Unwin,et al.  Introduction to Queueing Theory , 1973 .

[14]  T. Ferguson BAYESIAN DENSITY ESTIMATION BY MIXTURES OF NORMAL DISTRIBUTIONS , 1983 .

[15]  D. Aldous Exchangeability and related topics , 1985 .

[16]  Yuchung J. Wang,et al.  Stochastic Blockmodels for Directed Graphs , 1987 .

[17]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[18]  N. Hjort Nonparametric Bayes Estimators Based on Beta Processes in Models for Life History Data , 1990 .

[19]  R. Durrett Probability: Theory and Examples , 1993 .

[20]  J. Sethuraman A CONSTRUCTIVE DEFINITION OF DIRICHLET PRIORS , 1991 .

[21]  F. Dufresne,et al.  Risk Theory with the Gamma Process , 1991, ASTIN Bulletin.

[22]  Noel A Cressie,et al.  Statistics for Spatial Data. , 1992 .

[23]  Mike Rees,et al.  5. Statistics for Spatial Data , 1993 .

[24]  S. Chib,et al.  Bayesian analysis of binary and polychotomous response data , 1993 .

[25]  P. Paatero,et al.  Positive matrix factorization: A non-negative factor model with optimal utilization of error estimates of data values† , 1994 .

[26]  Zoubin Ghahramani,et al.  Factorial Learning and the EM Algorithm , 1994, NIPS.

[27]  M. Schervish Theory of Statistics , 1995 .

[28]  Geoffrey E. Hinton,et al.  Learning Population Codes by Minimizing Description Length , 1993, Neural Computation.

[29]  David B. Dunson,et al.  Bayesian Data Analysis , 2010 .

[30]  M. Escobar,et al.  Bayesian Density Estimation and Inference Using Mixtures , 1995 .

[31]  S. Wasserman,et al.  Logit models and logistic regressions for social networks: I. An introduction to Markov graphs andp , 1996 .

[32]  Bert Fristedt,et al.  A modern approach to probability theory , 1996 .

[33]  O. Kallenberg Foundations of Modern Probability , 2021, Probability Theory and Stochastic Modelling.

[34]  Sylvia Richardson,et al.  Markov Chain Monte Carlo in Practice , 1997 .

[35]  J. Pitman,et al.  The two-parameter Poisson-Dirichlet distribution derived from a stable subordinator , 1997 .

[36]  R. Wolpert,et al.  Poisson/gamma random field models for spatial statistics , 1998 .

[37]  S. MacEachern,et al.  Estimating mixture of dirichlet process models , 1998 .

[38]  Robert L. Wolpert,et al.  Simulation of Lévy Random Fields , 1998 .

[39]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[40]  Yongdai Kim NONPARAMETRIC BAYESIAN ESTIMATORS FOR COUNTING PROCESSES , 1999 .

[41]  Zoubin Ghahramani,et al.  A Unifying Review of Linear Gaussian Models , 1999, Neural Computation.

[42]  Carl E. Rasmussen,et al.  The Infinite Gaussian Mixture Model , 1999, NIPS.

[43]  R. Rummel Dimensionality of Nations project: attributes of nations and behavior of nation dyads , 1999 .

[44]  Radford M. Neal Markov Chain Sampling Methods for Dirichlet Process Mixture Models , 2000 .

[45]  Roger Woodard,et al.  Interpolation of Spatial Data: Some Theory for Kriging , 1999, Technometrics.

[46]  Lancelot F. James,et al.  Gibbs Sampling Methods for Stick-Breaking Priors , 2001 .

[47]  David J. Kriegman,et al.  From Few to Many: Illumination Cone Models for Face Recognition under Variable Lighting and Pose , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[48]  P. Green,et al.  Modelling Heterogeneity With and Without the Dirichlet Process , 2001 .

[49]  T. Snijders,et al.  Estimation and Prediction for Stochastic Blockstructures , 2001 .

[50]  I. Jolliffe Principal Component Analysis , 2002 .

[51]  Peter D. Hoff,et al.  Latent Space Approaches to Social Network Analysis , 2002 .

[52]  Naonori Ueda,et al.  Parametric Mixture Models for Multi-Labeled Text , 2002, NIPS.

[53]  Matthew J. Beal Variational algorithms for approximate Bayesian inference , 2003 .

[54]  Peter D. Ho Bilinear Mixed Eects Models for Dyadic Data , 2003 .

[55]  Thomas L. Griffiths,et al.  Hierarchical Topic Models and the Nested Chinese Restaurant Process , 2003, NIPS.

[56]  I. Jolliffe,et al.  A Modified Principal Component Technique Based on the LASSO , 2003 .

[57]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[58]  Christian Schmid,et al.  A Matlab function to estimate choice model parameters from paired-comparison data , 2004, Behavior research methods, instruments, & computers : a journal of the Psychonomic Society, Inc.

[59]  Michael I. Jordan,et al.  Variational methods for the Dirichlet process , 2004, ICML.

[60]  Gal Chechik,et al.  Euclidean Embedding of Co-occurrence Data , 2004, J. Mach. Learn. Res..

[61]  Christian P. Robert,et al.  Monte Carlo Statistical Methods , 2005, Springer Texts in Statistics.

[62]  Michael A. West,et al.  BAYESIAN MODEL ASSESSMENT IN FACTOR ANALYSIS , 2004 .

[63]  Michael I. Jordan,et al.  A Direct Formulation for Sparse Pca Using Semidefinite Programming , 2004, NIPS 2004.

[64]  Thomas L. Griffiths,et al.  Infinite latent feature models and the Indian buffet process , 2005, NIPS.

[65]  Antonio Torralba,et al.  Describing Visual Scenes using Transformed Dirichlet Processes , 2005, NIPS.

[66]  Charles X. Ling,et al.  Using AUC and accuracy in evaluating learning algorithms , 2005, IEEE Transactions on Knowledge and Data Engineering.

[67]  David B Dunson,et al.  Bayesian Inferences on Predictors of Conception Probabilities , 2005, Biometrics.

[68]  Peter D. Hoff,et al.  Bilinear Mixed-Effects Models for Dyadic Data , 2005 .

[69]  R. Tibshirani,et al.  Sparse Principal Component Analysis , 2006 .

[70]  R. Redon,et al.  Copy Number Variation: New Insights in Genome Diversity References , 2006 .

[71]  Wei Chu,et al.  Identifying Protein Complexes in High-Throughput Protein Interaction Screens Using an Infinite Latent Feature Model , 2005, Pacific Symposium on Biocomputing.

[72]  Carl E. Rasmussen,et al.  A choice model with infinitely many latent features , 2006, ICML.

[73]  Thomas L. Griffiths,et al.  Particle Filtering for Nonparametric Bayesian Matrix Factorization , 2006, NIPS.

[74]  Thomas L. Griffiths,et al.  Learning Systems of Concepts with an Infinite Relational Model , 2006, AAAI.

[75]  Edoardo M. Airoldi,et al.  Stochastic Block Models of Mixed Membership , 2006 .

[76]  Erik B. Sudderth Graphical models for visual object recognition and tracking , 2006 .

[77]  Michael I. Jordan,et al.  Hierarchical Dirichlet Processes , 2006 .

[78]  C. Holmes,et al.  Bayesian auxiliary variable models for binary and multinomial regression , 2006 .

[79]  Hans-Peter Kriegel,et al.  Infinite Hidden Relational Models , 2006, UAI.

[80]  Thomas L. Griffiths,et al.  A Non-Parametric Bayesian Method for Inferring Hidden Causes , 2006, UAI.

[81]  Peng Wang,et al.  Recent developments in exponential random graph (p*) models for social networks , 2007, Soc. Networks.

[82]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[83]  Thomas Hofmann,et al.  A Nonparametric Bayesian Method for Inferring Features From Similarity Judgments , 2007 .

[84]  B. Schölkopf,et al.  Modeling Dyadic Data with Binary Latent Factors , 2007 .

[85]  Michalis K. Titsias,et al.  The Infinite Gamma-Poisson Feature Model , 2007, NIPS.

[86]  Michael I. Jordan,et al.  Hierarchical Beta Processes and the Indian Buffet Process , 2007, AISTATS.

[87]  Zoubin Ghahramani,et al.  Infinite Sparse Factor Analysis and Infinite Independent Components Analysis , 2007, ICA.

[88]  Yee Whye Teh,et al.  Stick-breaking Construction for the Indian Buffet Process , 2007, AISTATS.

[89]  Stanley Wasserman,et al.  Social Network Analysis: Methods and Applications , 1994, Structural analysis in the social sciences.

[90]  Christian P. Robert,et al.  The Bayesian choice : from decision-theoretic foundations to computational implementation , 2007 .

[91]  T. Griffiths,et al.  Bayesian nonparametric latent feature models , 2007 .

[92]  J. Lupski Structural variation in the human genome. , 2007, The New England journal of medicine.

[93]  Michael I. Jordan,et al.  Hierarchical Bayesian Nonparametric Models with Applications , 2008 .

[94]  Michael I. Jordan,et al.  Nonparametric bayesian models for machine learning , 2008 .

[95]  Thomas L. Griffiths,et al.  Latent Features in Similarity Judgments: A Nonparametric Bayesian Approach , 2008, Neural Computation.

[96]  Hal Daumé,et al.  The Infinite Hierarchical Factor Regression Model , 2008, NIPS.

[97]  Yee Whye Teh,et al.  The Infinite Factorial Hidden Markov Model , 2008, NIPS.

[98]  Michael I. Jordan,et al.  Graphical Models, Exponential Families, and Variational Inference , 2008, Found. Trends Mach. Learn..

[99]  Tomas W. Fitzgerald,et al.  A robust statistical method for case-control association testing with copy number variation , 2008, Nature Genetics.

[100]  Thomas L. Griffiths,et al.  The Phylogenetic Indian Buffet Process: A Non-Exchangeable Nonparametric Prior for Latent Features , 2008, UAI.

[101]  Zoubin Ghahramani,et al.  Accelerated sampling for the Indian Buffet Process , 2009, ICML '09.

[102]  Yee Whye Teh,et al.  Variational Inference for the Indian Buffet Process , 2009, AISTATS.

[103]  Y. Teh,et al.  Indian Buffet Processes with Power-law Behavior , 2009, NIPS.

[104]  Thomas L. Griffiths,et al.  Nonparametric Latent Feature Models for Link Prediction , 2009, NIPS.

[105]  Michael I. Jordan,et al.  Sharing Features among Dynamical Systems with Beta Processes , 2009, NIPS.

[106]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[107]  Zoubin Ghahramani,et al.  Large Scale Nonparametric Bayesian Inference: Data Parallelisation in the Indian Buffet Process , 2009, NIPS.

[108]  Lawrence Carin,et al.  Nonparametric factor analysis with beta process priors , 2009, ICML '09.

[109]  Zoubin Ghahramani,et al.  Correlated Non-Parametric Latent Feature Models , 2009, UAI.

[110]  Yee Whye Teh,et al.  Spatial Normalized Gamma Processes , 2009, NIPS.

[111]  Nir Friedman,et al.  Probabilistic Graphical Models - Principles and Techniques , 2009 .

[112]  Mahdi Shafiei,et al.  Mixed-Membership Stochastic Block-Models for Transactional Data , 2009 .

[113]  Peter D. Hoff,et al.  Multiplicative latent factor models for description and prediction of social networks , 2009, Comput. Math. Organ. Theory.

[114]  Perry R. Cook,et al.  Bayesian Nonparametric Matrix Factorization for Recorded Music , 2010, ICML.

[115]  Lawrence Carin,et al.  A Stick-Breaking Construction of the Beta Process , 2010, ICML.

[116]  Zoubin Ghahramani,et al.  Dependent Indian Buffet Processes , 2010, AISTATS.

[117]  Chong Wang,et al.  The IBP Compound Dirichlet Process and its Application to Focused Topic Modeling , 2010, ICML.

[118]  Thomas L. Griffiths,et al.  Learning invariant features using the Transformed Indian Buffet Process , 2010, NIPS.

[119]  Yee Whye Teh,et al.  Dirichlet Process , 2017, Encyclopedia of Machine Learning and Data Mining.

[120]  Michael I. Jordan,et al.  Bayesian Nonparametrics: Hierarchical Bayesian nonparametric models with applications , 2010 .

[121]  Christopher Holmes,et al.  Bayesian Nonparametrics: Frontmatter , 2010 .

[122]  W. Eric L. Grimson,et al.  Construction of Dependent Dirichlet Processes based on Poisson Processes , 2010, NIPS.

[123]  L. Chin,et al.  Making sense of cancer genomic data. , 2011, Genes & development.

[124]  J. Pitman,et al.  Beta processes , stick-breaking , and power laws MS Project Report , 2011 .

[125]  佐藤 健一 Lévy processes and infinitely divisible distributions , 2013 .