Exponential family mixed membership models for soft clustering of multivariate data

For several years, model-based clustering methods have successfully tackled many of the challenges presented by data-analysts. However, as the scope of data analysis has evolved, some problems may be beyond the standard mixture model framework. One such problem is when observations in a dataset come from overlapping clusters, whereby different clusters will possess similar parameters for multiple variables. In this setting, mixed membership models, a soft clustering approach whereby observations are not restricted to single cluster membership, have proved to be an effective tool. In this paper, a method for fitting mixed membership models to data generated by a member of an exponential family is outlined. The method is applied to count data obtained from an ultra running competition, and compared with a standard mixture model approach.

[1]  David M. Miller,et al.  Handbook of Mathematical Functions With Formulas, Graphs and Mathematical Tables (National Bureau of Standards Applied Mathematics Series No. 55) , 1965 .

[2]  Colin Campbell,et al.  The latent process decomposition of cDNA microarray data sets , 2005, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[3]  I. C. Gormley,et al.  A grade of membership model for rank data , 2009 .

[4]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[5]  Milton Abramowitz,et al.  Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables , 1964 .

[6]  G. Celeux,et al.  Assessing a Mixture Model for Clustering with the Integrated Classification Likelihood , 1998 .

[7]  Matthew J. Beal Variational algorithms for approximate Bayesian inference , 2003 .

[8]  Gilles Celeux,et al.  Combining Mixture Components for Clustering , 2010, Journal of computational and graphical statistics : a joint publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America.

[9]  B. Everitt,et al.  Finite Mixture Distributions , 1981 .

[10]  J. Vermunt,et al.  Latent class cluster analysis , 2002 .

[11]  J. Lafferty,et al.  Mixed-membership models of scientific publications , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[12]  John D. Lafferty,et al.  A correlated topic model of Science , 2007, 0708.3601.

[13]  S. Fienberg,et al.  Discovering Latent Patterns with Hierarchical Bayesian Mixed-Membership Models , 2006 .

[14]  Diana Bohm,et al.  Applied Latent Class Analysis , 2016 .

[15]  April Galyardt,et al.  Interpreting Mixed Membership Models: Implications of Erosheva’s Representation Theorem , 2014 .

[16]  Edoardo M. Airoldi,et al.  Handbook of Mixed Membership Models and Their Applications , 2014 .

[17]  Nasser M. Nasrabadi,et al.  Pattern Recognition and Machine Learning , 2006, Technometrics.

[18]  Gérard Govaert,et al.  Assessing a Mixture Model for Clustering with the Integrated Completed Likelihood , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[19]  Christopher M. Bishop,et al.  Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[20]  Adrian E. Raftery,et al.  Model-Based Clustering, Discriminant Analysis, and Density Estimation , 2002 .

[21]  BiernackiChristophe,et al.  Assessing a Mixture Model for Clustering with the Integrated Completed Likelihood , 2000 .

[22]  John D. Lafferty,et al.  Dynamic topic models , 2006, ICML.

[23]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[24]  Raimon Tolosana-Delgado,et al.  "compositions": A unified R package to analyze compositional data , 2008, Comput. Geosci..

[25]  Thomas Brendan Murphy,et al.  Mixed Membership Models for Exploring User Roles in Online Fora , 2012, ICWSM.

[26]  Edoardo M. Airoldi,et al.  Introduction to Mixed Membership Models and Methods , 2014, Handbook of Mixed Membership Models and Their Applications.

[27]  P. Deb Finite Mixture Models , 2008 .

[28]  M. Abramowitz,et al.  Handbook of Mathematical Functions With Formulas, Graphs and Mathematical Tables (National Bureau of Standards Applied Mathematics Series No. 55) , 1965 .

[29]  S. Fienberg,et al.  DESCRIBING DISABILITY THROUGH INDIVIDUAL-LEVEL MIXTURE MODELS FOR MULTIVARIATE BINARY DATA. , 2007, The annals of applied statistics.

[30]  CampbellColin,et al.  The Latent Process Decomposition of cDNA Microarray Data Sets , 2005 .

[31]  M. Hill Diversity and Evenness: A Unifying Notation and Its Consequences , 1973 .

[32]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[33]  Adrian E. Raftery,et al.  Inference in model-based cluster analysis , 1997, Stat. Comput..

[34]  M. Wand,et al.  Explaining Variational Approximations , 2010 .

[35]  Daniel Manrique-Vallier,et al.  Longitudinal Mixed Membership Trajectory Models for Disability Survey Data. , 2013, The annals of applied statistics.

[36]  Chong Wang,et al.  Variational inference in nonconjugate models , 2012, J. Mach. Learn. Res..