Negative Binomial Process Count and Mixture Modeling

The seemingly disjoint problems of count and mixture modeling are united under the negative binomial (NB) process. A gamma process is employed to model the rate measure of a Poisson process, whose normalization provides a random probability measure for mixture modeling and whose marginalization leads to an NB process for count modeling. A draw from the NB process consists of a Poisson distributed finite number of distinct atoms, each of which is associated with a logarithmic distributed number of data samples. We reveal relationships between various count- and mixture-modeling distributions and construct a Poisson-logarithmic bivariate distribution that connects the NB and Chinese restaurant table distributions. Fundamental properties of the models are developed, and we derive efficient Bayesian inference. It is shown that with augmentation and normalization, the NB process and gamma-NB process can be reduced to the Dirichlet process and hierarchical Dirichlet process, respectively. These relationships highlight theoretical, structural, and computational advantages of the NB process. A variety of NB processes, including the beta-geometric, beta-NB, marked-beta-NB, marked-gamma-NB and zero-inflated-NB processes, with distinct sharing mechanisms, are also constructed. These models are applied to topic modeling, with connections made to existing algorithms under Poisson factor analysis. Example results show the importance of inferring both the NB dispersion and probability parameters.

[1]  N. Hjort Nonparametric Bayes Estimators Based on Beta Processes in Models for Life History Data , 1990 .

[2]  D. Blei,et al.  The Discrete Innite Logistic Normal Distribution , 2011, 1103.4789.

[3]  Christopher M. Bishop,et al.  Variational Relevance Vector Machines , 2000, UAI.

[4]  Chong Wang,et al.  The IBP Compound Dirichlet Process and its Application to Focused Topic Modeling , 2010, ICML.

[5]  J. Griffin,et al.  Posterior Simulation of Normalized Random Measure Mixtures , 2011 .

[6]  Sudhir Paul,et al.  Bias-corrected maximum likelihood estimator of the negative binomial dispersion parameter. , 2005, Biometrics.

[7]  G. Roberts,et al.  Retrospective Markov chain Monte Carlo methods for Dirichlet process hierarchical models , 2007, 0710.4228.

[8]  Lawrence Carin,et al.  Nonparametric factor analysis with beta process priors , 2009, ICML '09.

[9]  Chong Wang,et al.  Online Variational Inference for the Hierarchical Dirichlet Process , 2011, AISTATS.

[10]  Yee Whye Teh,et al.  Stick-breaking Construction for the Indian Buffet Process , 2007, AISTATS.

[11]  Lawrence Carin,et al.  Augment-and-Conquer Negative Binomial Processes , 2012, NIPS.

[12]  Guillermo Sapiro,et al.  On the Integration of Topic Modeling and Dictionary Learning , 2011, ICML.

[13]  D. Blackwell,et al.  Ferguson Distributions Via Polya Urn Schemes , 1973 .

[14]  H. Ishwaran,et al.  Exact and approximate sum representations for the Dirichlet process , 2002 .

[15]  Quenouille Mh,et al.  A relation between the logarithmic, Poisson, and negative binomial series. , 1949 .

[16]  Peter S. Fader,et al.  Bayesian Inference for the Negative Binomial Distribution via Polynomial Expansions , 2002 .

[17]  Francis R. Bach,et al.  Online Learning for Latent Dirichlet Allocation , 2010, NIPS.

[18]  M. H. Quenouille,et al.  A relation between the logarithmic, Poisson, and negative binomial series. , 1949, Biometrics.

[19]  Aleks Jakulin,et al.  Discrete Component Analysis , 2005, SLSFS.

[20]  J. Lloyd-Smith Maximum Likelihood Estimation of the Negative Binomial Dispersion Parameter for Highly Overdispersed Data, with Applications to Infectious Diseases , 2007, PloS one.

[21]  Marko Grobelnik,et al.  Subspace, Latent Structure and Feature Selection techniques , 2006 .

[22]  Michael I. Jordan,et al.  Bayesian Nonparametric Latent Feature Models , 2011 .

[23]  H. Damasio,et al.  IEEE Transactions on Pattern Analysis and Machine Intelligence: Special Issue on Perceptual Organization in Computer Vision , 1998 .

[24]  David B. Dunson,et al.  Lognormal and Gamma Mixed Negative Binomial Regression , 2012, ICML.

[25]  Michael I. Jordan Hierarchical Models , Nested Models and Completely Random Measures , 2010 .

[26]  Thomas L. Griffiths,et al.  Infinite latent feature models and the Indian buffet process , 2005, NIPS.

[27]  David B. Dunson,et al.  Beta-Negative Binomial Process and Poisson Factor Analysis , 2011, AISTATS.

[28]  Yongdai Kim NONPARAMETRIC BAYESIAN ESTIMATORS FOR COUNTING PROCESSES , 1999 .

[29]  W. Piegorsch Maximum likelihood estimation for the negative binomial dispersion parameter. , 1990, Biometrics.

[30]  Zoubin Ghahramani,et al.  Infinite Sparse Factor Analysis and Infinite Independent Components Analysis , 2007, ICA.

[31]  Yee Whye Teh,et al.  On Smoothing and Inference for Topic Models , 2009, UAI.

[32]  J. Kingman,et al.  Completely random measures. , 1967 .

[33]  N. Shephard,et al.  Integer-valued Lévy processes and low latency financial econometrics , 2012 .

[34]  Pravin K. Trivedi,et al.  Regression Analysis of Count Data , 1998 .

[35]  T. Griffiths,et al.  Bayesian nonparametric latent feature models , 2007 .

[36]  Ali Taylan Cemgil,et al.  Bayesian Inference for Nonnegative Matrix Factorisation Models , 2009, Comput. Intell. Neurosci..

[37]  Michael I. Jordan,et al.  Hierarchical Beta Processes and the Indian Buffet Process , 2007, AISTATS.

[38]  H. Ishwaran,et al.  Markov chain Monte Carlo in approximate Dirichlet and beta two-parameter process hierarchical models , 2000 .

[39]  Michalis K. Titsias,et al.  The Infinite Gamma-Poisson Feature Model , 2007, NIPS.

[40]  Michael I. Jordan,et al.  Developing a tempered HDP-HMM for Systems with State Persistence , 2007 .

[41]  Karl J. Friston,et al.  Hierarchical Models , 2003 .

[42]  David B. Dunson,et al.  Nonparametric Bayesian Dictionary Learning for Analysis of Noisy and Incomplete Images , 2012, IEEE Transactions on Image Processing.

[43]  J. Pitman,et al.  The two-parameter Poisson-Dirichlet distribution derived from a stable subordinator , 1997 .

[44]  A. W. Kemp,et al.  Univariate Discrete Distributions , 1993 .

[45]  S. MacEachern,et al.  Estimating mixture of dirichlet process models , 1998 .

[46]  Stephen G. Walker,et al.  Sampling the Dirichlet Mixture Model with Slices , 2006, Commun. Stat. Simul. Comput..

[47]  D. Aldous Exchangeability and related topics , 1985 .

[48]  M. Clyde,et al.  Stochastic expansions using continuous dictionaries: Lévy adaptive regression kernels , 2011, 1112.3149.

[49]  J. Lawless Negative binomial and mixed Poisson regression , 1987 .

[50]  J. Pitman Combinatorial Stochastic Processes , 2006 .

[51]  M. R. Leadbetter Poisson Processes , 2011, International Encyclopedia of Statistical Science.

[52]  J. H. Matis,et al.  Small Sample Comparison of Different Estimators of Negative Binomial Parameters , 1977 .

[53]  David B. Dunson,et al.  Dependent Hierarchical Beta Process for Image Interpolation and Denoising , 2011, AISTATS.

[54]  D. Dunson,et al.  Bayesian latent variable models for mixed discrete outcomes. , 2005, Biostatistics.

[55]  Radford M. Neal Markov Chain Sampling Methods for Dirichlet Process Mixture Models , 2000 .

[56]  Chong Wang,et al.  The Discrete Infinite Logistic Normal Distribution for Mixed-Membership Modeling , 2011, AISTATS.

[57]  Mark Steyvers,et al.  Finding scientific topics , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[58]  J. L. Folks,et al.  Multistage estimation compared with fixed-sample-size estimation of the negative binomial parameter k , 1984 .

[59]  Guillermo Sapiro,et al.  Non-Parametric Bayesian Dictionary Learning for Sparse Image Representations , 2009, NIPS.

[60]  E. Çinlar Probability and Stochastics , 2011 .

[61]  Michael I. Jordan,et al.  Hierarchical Dirichlet Processes , 2006 .

[62]  H. Friedl Econometric Analysis of Count Data , 2002 .

[63]  Ruslan Salakhutdinov,et al.  Evaluation methods for topic models , 2009, ICML '09.

[64]  Matthew J. Beal Variational algorithms for approximate Bayesian inference , 2003 .

[65]  Michael I. Jordan,et al.  Nonparametric bayesian models for machine learning , 2008 .

[66]  Thomas Hofmann,et al.  Probabilistic Latent Semantic Analysis , 1999, UAI.

[67]  Ramsés H. Mena,et al.  Controlling the reinforcement in Bayesian non‐parametric mixture models , 2007 .

[68]  John F. Canny,et al.  GaP: a factor model for discrete data , 2004, SIGIR '04.

[69]  M. Escobar,et al.  Bayesian Density Estimation and Inference Using Mixtures , 1995 .

[70]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[71]  P. Damlen,et al.  Gibbs sampling for Bayesian non‐conjugate and hierarchical models by using auxiliary variables , 1999 .

[72]  C. Antoniak Mixtures of Dirichlet Processes with Applications to Bayesian Nonparametric Problems , 1974 .

[73]  L. Carin,et al.  Nonparametric Bayesian matrix completion , 2010, 2010 IEEE Sensor Array and Multichannel Signal Processing Workshop.

[74]  M. Robinson,et al.  Small-sample estimation of negative binomial dispersion, with applications to SAGE data. , 2007, Biostatistics.

[75]  Lancelot F. James,et al.  Gibbs Sampling Methods for Stick-Breaking Priors , 2001 .

[76]  Gordon E. Willmot,et al.  A mixed poisson–inverse‐gaussian regression model , 1989 .

[77]  T. Ferguson A Bayesian Analysis of Some Nonparametric Problems , 1973 .

[78]  Emin Orhan Dirichlet Processes , 2012 .

[79]  M. Greenwood,et al.  An Inquiry into the Nature of Frequency Distributions Representative of Multiple Happenings with Particular Reference to the Occurrence of Multiple Attacks of Disease or of Repeated Accidents , 1920 .

[80]  Nancy Bertin,et al.  Nonnegative Matrix Factorization with the Itakura-Saito Divergence: With Application to Music Analysis , 2009, Neural Computation.

[81]  C. I. Bliss,et al.  FITTING THE NEGATIVE BINOMIAL DISTRIBUTION TO BIOLOGICAL DATA AND NOTE ON THE EFFICIENT FITTING OF THE NEGATIVE BINOMIAL , 1953 .

[82]  J. T. Wulu,et al.  Regression analysis of count data , 2002 .

[83]  H. Sebastian Seung,et al.  Algorithms for Non-negative Matrix Factorization , 2000, NIPS.

[84]  J. Davis Univariate Discrete Distributions , 2006 .

[85]  Stephen G. Walker,et al.  Slice sampling mixture models , 2011, Stat. Comput..

[86]  A. W. Kemp,et al.  Univariate Discrete Distributions: Johnson/Univariate Discrete Distributions , 2005 .

[87]  Michael I. Jordan,et al.  Combinatorial Clustering and the Beta Negative Binomial Process , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[88]  R. Wolpert,et al.  Poisson/gamma random field models for spatial statistics , 1998 .