A Tutorial on the Dirichlet Process for Engineers Technical Report

This document provides a review of the Dirichlet process originally given in the author’s preliminary exam paper and is presented here as a tutorial. No motivation is given (in what I’ve excerpted here), and this document is intended to be a mathematical tutorial that is still accessible to the engineer. I. THE DIRICHLET DISTRIBUTION Consider the finite, D-dimensional vector, π, having the properties 0 ≤ πi ≤ 1, i = 1, . . . , D and ∑D i=1 πi = 1 (i.e., residing on the (D − 1)-dimensional simplex in RD, or π ∈ ∆D). We view this vector as the parameter for the multinomial distribution, where samples, X ∼ Mult(π), take values X ∈ {1, . . . , D} with probability P (X = i|π) = πi. When the vector π is unknown, it can be inferred in the Bayesian setting using its conjugate prior, the Dirichlet distribution. The Dirichlet distribution of dimension D is a continuous probability measure on ∆D having the density function p(π|β1, . . . , βD) = Γ\( ∑ i βi) ∏ i Γ\(βi) D ∏

[1]  R. F.,et al.  Mathematical Statistics , 1944, Nature.

[2]  Radford M. Neal Markov Chain Sampling Methods for Dirichlet Process Mixture Models , 2000 .

[3]  J. Ghosh Statistics Independent of a Complete Sufficient Statistic , 1988 .

[4]  L. J. Savage,et al.  Symmetric measures on Cartesian products , 1955 .

[5]  D. Blackwell,et al.  Ferguson Distributions Via Polya Urn Schemes , 1973 .

[6]  J. Sethuraman A CONSTRUCTIVE DEFINITION OF DIRICHLET PRIORS , 1991 .

[7]  David B. Dunson,et al.  Multi-Task Learning for Analyzing and Sorting Large Databases of Sequential Data , 2008, IEEE Transactions on Signal Processing.

[8]  M. Escobar,et al.  Bayesian Density Estimation and Inference Using Mixtures , 1995 .

[9]  J. Kingman Uses of Exchangeability , 1978 .

[10]  J. Pitman Some developments of the Blackwell-MacQueen urn scheme , 1996 .

[11]  L. Shepp Probability Essentials , 2002 .

[12]  Dani Gamerman,et al.  Markov Chain Monte Carlo: Stochastic Simulation for Bayesian Inference, Second Edition , 2006 .

[13]  Lawrence Carin,et al.  Multi-Task Learning for Classification with Dirichlet Process Priors , 2007, J. Mach. Learn. Res..

[14]  C. Antoniak Mixtures of Dirichlet Processes with Applications to Bayesian Nonparametric Problems , 1974 .

[15]  G. Roberts,et al.  Retrospective Markov chain Monte Carlo methods for Dirichlet process hierarchical models , 2007, 0710.4228.

[16]  T. Ferguson A Bayesian Analysis of Some Nonparametric Problems , 1973 .

[17]  Michael I. Jordan,et al.  Variational inference for Dirichlet process mixtures , 2006 .

[18]  Lawrence Carin,et al.  Hidden Markov Models With Stick-Breaking Priors , 2009, IEEE Transactions on Signal Processing.

[19]  Michael A. West,et al.  Hierarchical priors and mixture models, with applications in regression and density estimation , 2006 .

[20]  D. Aldous Exchangeability and related topics , 1985 .

[21]  B. D. Finetti La prévision : ses lois logiques, ses sources subjectives , 1937 .

[22]  A. Gelfand,et al.  The Nested Dirichlet Process , 2008 .

[23]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[24]  Rich Caruana,et al.  Multitask Learning , 1998, Encyclopedia of Machine Learning and Data Mining.

[25]  Michael I. Jordan,et al.  Hierarchical Dirichlet Processes , 2006 .

[26]  Robert J. Connor,et al.  Concepts of Independence for Proportions with a Generalization of the Dirichlet Distribution , 1969 .

[27]  Matthew J. Beal Variational algorithms for approximate Bayesian inference , 2003 .