Bayesian nonparametric temporal dynamic clustering via autoregressive Dirichlet priors

In this paper we consider the problem of dynamic clustering, where cluster memberships may change over time and clusters may split and merge over time, thus creating new clusters and destroying existing ones. We propose a Bayesian nonparametric approach to dynamic clustering via mixture modeling. Our approach relies on a novel time-dependent nonparametric prior defined by combining: i) a copula-based transformation of a Gaussian autoregressive process; ii) the stick-breaking construction of the Dirichlet process. Posterior inference is performed through a particle Markov chain Monte Carlo algorithm which is simple, computationally efficient and scalable to massive datasets. Advantages of the proposed approach include flexibility in applications, ease of computations and interpretability. We present an application of our dynamic Bayesian nonparametric mixture model to the study the temporal dynamics of gender stereotypes in adjectives and occupations in the 20th and 21st centuries in the United States. Moreover, to highlight the flexibility of our model we present additional applications to time-dependent data with covariates and with spatial structure.

[1]  Michael A. West,et al.  Time Series: Modeling, Computation, and Inference , 2010 .

[2]  D. Binder Bayesian cluster analysis , 1978 .

[3]  Arnaud Doucet,et al.  Generalized Polya Urn for Time-varying Dirichlet Process Mixtures , 2007, UAI.

[4]  Peter Müller,et al.  A Bayesian Population Model with Hierarchical Mixture Priors Applied to Blood Count Data , 1997 .

[5]  A. Kottas,et al.  Modeling for seasonal marked point processes: An analysis of evolving hurricane occurrences , 2015, 1506.00429.

[6]  Jure Leskovec,et al.  Diachronic Word Embeddings Reveal Statistical Laws of Semantic Change , 2016, ACL.

[7]  G. Rosner,et al.  Phase I trial of granulocyte-macrophage colony-stimulating factor plus high-dose cyclophosphamide given every 2 weeks: a Cancer and Leukemia Group B study. , 1993, Journal of the National Cancer Institute.

[8]  Fernando A. Quintana,et al.  On the Support of MacEachern’s Dependent Dirichlet Processes and Extensions , 2012 .

[9]  David B Dunson,et al.  Nonparametric Bayesian models through probit stick-breaking processes. , 2011, Bayesian analysis.

[10]  M. Wall A close look at the spatial structure implied by the CAR and SAR models , 2004 .

[11]  J. Sethuraman A CONSTRUCTIVE DEFINITION OF DIRICHLET PRIORS , 1991 .

[12]  Athanasios Kottas,et al.  Modeling for Dynamic Ordinal Regression Relationships: An Application to Estimating Maturity of Rockfish in California , 2015, 1507.01242.

[13]  Enrique ter Horst,et al.  Bayesian dynamic density estimation , 2008 .

[14]  Luis Gutiérrez,et al.  A time dependent Bayesian nonparametric model for air quality analysis , 2016, Comput. Stat. Data Anal..

[15]  A. Doucet,et al.  Particle Markov chain Monte Carlo methods , 2010 .

[16]  Jim E. Griffin,et al.  Stick-breaking autoregressive processes , 2011 .

[17]  T. Ferguson A Bayesian Analysis of Some Nonparametric Problems , 1973 .

[18]  J. Holmes,et al.  The handbook of language and gender , 2003 .

[19]  L. Baum,et al.  A Maximization Technique Occurring in the Statistical Analysis of Probabilistic Functions of Markov Chains , 1970 .

[20]  Michael,et al.  On a Class of Bayesian Nonparametric Estimates : I . Density Estimates , 2008 .

[21]  Matt Taddy Autoregressive Mixture Models for Dynamic Spatial Poisson Processes: Application to Tracking Intensity of Violent Crime , 2010 .

[22]  Peter Müller,et al.  A Simple Class of Bayesian Nonparametric Autoregression Models. , 2013, Bayesian analysis.

[23]  J. E. Griffin,et al.  Order-Based Dependent Dirichlet Processes , 2006 .

[24]  C. Antoniak Mixtures of Dirichlet Processes with Applications to Bayesian Nonparametric Problems , 1974 .

[25]  Daniel Jurafsky,et al.  Word embeddings quantify 100 years of gender and ethnic stereotypes , 2017, Proceedings of the National Academy of Sciences.

[26]  L. Baum,et al.  An inequality with applications to statistical estimation for probabilistic functions of Markov processes and to a model for ecology , 1967 .

[27]  Yuan Ji,et al.  A Time‐Series DDP for Functional Proteomics Profiles , 2012, Biometrics.

[28]  Norman E. Breslow,et al.  Estimation of Disease Rates in Small Areas: A new Mixed Model for Spatial Dependence , 2000 .

[29]  F. Bassetti,et al.  Beta-Product Dependent Pitman-Yor Processes for Bayesian Inference , 2013 .

[30]  Jason Weston,et al.  Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..

[31]  M. Plummer,et al.  CODA: convergence diagnosis and output analysis for MCMC , 2006 .