A review of multivariate distributions for count data derived from the Poisson distribution

The Poisson distribution has been widely studied and used for modeling univariate count‐valued data. However, multivariate generalizations of the Poisson distribution that permit dependencies have been far less popular. Yet, real‐world, high‐dimensional, count‐valued data found in word counts, genomics, and crime statistics, for example, exhibit rich dependencies and motivate the need for multivariate distributions that can appropriately model this data. We review multivariate distributions derived from the univariate Poisson, categorizing these models into three main classes: (1) where the marginal distributions are Poisson, (2) where the joint distribution is a mixture of independent multivariate Poisson distributions, and (3) where the node‐conditional distributions are derived from the Poisson. We discuss the development of multiple instances of these classes and compare the models in terms of interpretability and theory. Then, we empirically compare multiple models from each class on three real‐world datasets that have varying data characteristics from different domains, namely traffic accident data, biological next generation sequencing data, and text data. These empirical experiments develop intuition about the comparative advantages and disadvantages of each class of multivariate distribution that was derived from the Poisson. Finally, we suggest new research directions as explored in the subsequent Discussion section. WIREs Comput Stat 2017, 9:e1398. doi: 10.1002/wics.1398

[1]  Dimitris Karlis,et al.  Strategies for Efficient Computation of Multivariate Poisson Probabilities , 2004 .

[2]  D. Karlis,et al.  Mixed Poisson Distributions , 2005 .

[3]  Noel A Cressie,et al.  Modeling Poisson variables with positive spatial dependence , 1997 .

[4]  Genevera I. Allen,et al.  TCGA2STAT: simple TCGA data access for integrated statistical analysis in R , 2016, Bioinform..

[5]  Hannes Kazianka,et al.  Copula-based geostatistical modeling of continuous and discrete data including covariates , 2010 .

[6]  Pradeep Ravikumar,et al.  Fixed-Length Poisson MRF: Adding Dependencies to the Multinomial , 2015, NIPS.

[7]  Sung Won Han,et al.  Estimation of sparse directed acyclic graphs for multivariate counts data , 2016, Biometrics.

[8]  L. Madsen Maximum likelihood estimation of regression parameters with spatially dependent discrete data , 2009 .

[9]  Eun Sug Park,et al.  Multivariate Poisson-Lognormal Models for Jointly Modeling Crash Frequency by Severity , 2007 .

[10]  J. Aitchison,et al.  The multivariate Poisson-log normal distribution , 1989 .

[11]  P. X. Song,et al.  Multivariate Dispersion Models Generated From Gaussian Copula , 2000 .

[12]  Bernhard Schölkopf,et al.  A Kernel Two-Sample Test , 2012, J. Mach. Learn. Res..

[13]  C. Wikle,et al.  Computationally Efficient Distribution Theory for Bayesian Inference of High-Dimensional Dependent Count-Valued Data , 2015, 1512.07273.

[14]  Pradeep Ravikumar,et al.  QUIC: quadratic approximation for sparse inverse covariance estimation , 2014, J. Mach. Learn. Res..

[15]  Eike Christian Brechmann,et al.  Selection of Vine Copulas , 2013 .

[16]  Claudia Czado,et al.  Pair Copula Constructions for Multivariate Discrete Data , 2012 .

[17]  D. M. Mahamunulu A Note on Regression in the Multivariate Poisson Distribution , 1967 .

[18]  Radford M. Neal Annealed importance sampling , 1998, Stat. Comput..

[19]  Aristidis K. Nikoloulopoulos Copula-Based Models for Multivariate Discrete Response Data , 2013 .

[20]  Peter Clifford,et al.  Markov Random Fields in Statistics , 2012 .

[21]  D. Karlis,et al.  Finite mixtures of multivariate Poisson distributions with application , 2007 .

[22]  Richard J. Cook,et al.  A copula‐based mixed Poisson model for bivariate recurrent events under event‐dependent censoring , 2010, Statistics in medicine.

[23]  C. R. Rao,et al.  ON A CHARACTERIZATION OF THE POISSON DISTRIBUTION , 1964 .

[24]  F. Krummenauer Limit theorems for multivariate discrete distributions , 1998 .

[25]  Kazuhiko Kano,et al.  On recurrence relations for the probability function of multivariate generalized Poisson distribution , 1991 .

[26]  Andréas Heinen,et al.  Multivariate Reduced Rank Regression in Non-Gaussian Contexts, Using Copulas , 2004, Comput. Stat. Data Anal..

[27]  Henry Teicher,et al.  On the multivariate poisson distribution , 1954 .

[28]  C. Genest,et al.  A Primer on Copulas for Count Data , 2007, ASTIN Bulletin.

[29]  Dimitris Karlis,et al.  Modeling Multivariate Count Data Using Copulas , 2009, Commun. Stat. Simul. Comput..

[30]  L Madsen,et al.  Joint regression analysis for discrete longitudinal data. , 2011, Biometrics.

[31]  Kendrick,et al.  Applications of Mathematics to Medical Problems , 1925, Proceedings of the Edinburgh Mathematical Society.

[32]  Genevera I. Allen,et al.  A Local Poisson Graphical Model for Inferring Networks From Sequencing Data , 2013, IEEE Transactions on NanoBioscience.

[33]  Deyu Meng,et al.  FastMMD: Ensemble of Circular Discrepancy for Efficient Two-Sample Test , 2014, Neural Computation.

[34]  Fred L Mannering,et al.  Highway accident severities and the mixed logit model: an exploratory empirical analysis. , 2008, Accident; analysis and prevention.

[35]  H. S. Steyn On the Multivariate Poisson Normal Distribution , 1976 .

[36]  Paul P Jovanis,et al.  Bayesian Multivariate Poisson Lognormal Models for Crash Severity Modeling and Site Ranking , 2009 .

[37]  Pravin K. Trivedi,et al.  Copula Modeling: An Introduction for Practitioners , 2007 .

[38]  Meyer Dwass,et al.  On Infinitely Divisible Random Vectors , 1957 .

[39]  Satish V. Ukkusuri,et al.  An Efficient Parallel Sampling Technique for Multivariate Poisson-Lognormal Model: Analysis with Two Crash Count Datasets , 2015 .

[40]  P. Altham,et al.  Multivariate Generalizations of the Multiplicative Binomial Distribution: Introducing the MM Package , 2012 .

[41]  N. Meinshausen,et al.  High-dimensional graphs and variable selection with the Lasso , 2006, math/0608017.

[42]  J. T. Campbell,et al.  The Poisson Correlation Function , 1934 .

[43]  D. Karlis Models for Multivariate Count Time Series , 2015 .

[44]  A. Frigessi,et al.  Pair-copula constructions of multiple dependence , 2009 .

[45]  H. Joe,et al.  The Estimation Method of Inference Functions for Margins for Multivariate Models , 1996 .

[46]  Hannes Kazianka,et al.  Approximate copula-based estimation and prediction of discrete spatial data , 2013, Stochastic Environmental Research and Risk Assessment.

[47]  J. Besag Spatial Interaction and the Statistical Analysis of Lattice Systems , 1974 .

[48]  Pradeep Ravikumar,et al.  Square Root Graphical Models: Multivariate Generalizations of Univariate Exponential Families that Permit Positive Dependencies , 2016, ICML.

[49]  Chapter 1 Copulas , Sklar ’ s Theorem , and Distributional Transform , 2019 .

[50]  Pradeep Ravikumar,et al.  Graphical models via univariate exponential family distributions , 2013, J. Mach. Learn. Res..

[51]  John M. Olin Markov Chain Monte Carlo Analysis of Correlated Count Data , 2003 .

[52]  Paul Damien,et al.  A multivariate Poisson-lognormal regression model for prediction of crash counts by severity, using Bayesian methods. , 2008, Accident; analysis and prevention.

[53]  Genevera I. Allen,et al.  A Log-Linear Graphical Model for inferring genetic networks from high-throughput sequencing data , 2012, 2012 IEEE International Conference on Bioinformatics and Biomedicine.

[54]  Galit Shmueli,et al.  On Generating Multivariate Poisson Data in Management Science Applications , 2009 .

[55]  T. Bedford,et al.  Vines: A new graphical model for dependent random variables , 2002 .

[56]  P. Altham,et al.  Two Generalizations of the Binomial Distribution , 1978 .

[57]  D. Karlis An EM algorithm for multivariate Poisson distribution and related models , 2003 .

[58]  David I. Inouye,et al.  Generalized Root Models: Beyond Pairwise Graphical Models for Univariate Exponential Families , 2016, 1606.00813.

[59]  Kazutomo Kawamura,et al.  The structure of multivariate Poisson distribution , 1979 .

[60]  Andréas Heinen,et al.  Multivariate autoregressive modeling of time series count data using copulas , 2007 .

[61]  Pradeep Ravikumar,et al.  Graphical Models via Generalized Linear Models , 2012, NIPS.

[62]  Pradeep Ravikumar,et al.  On Poisson Graphical Models , 2013, NIPS.

[63]  Fabian Hadiji,et al.  Poisson Dependency Networks: Gradient Boosted Models for Multivariate Count Data , 2015, Machine Learning.

[64]  Michel Denuit,et al.  Constraints on concordance measures in bivariate discrete data , 2005 .

[65]  Abe Sklar,et al.  Random variables, joint distribution functions, and copulas , 1973, Kybernetika.

[66]  A. G. Arbous,et al.  Accident statistics and the concept of accident-proneness , 1951 .

[67]  K. El-Basyouny,et al.  Collision prediction models using multivariate Poisson-lognormal regression. , 2009, Accident; analysis and prevention.

[68]  Y. H. Wang Characterizations of certain multivariate distributions , 1974, Mathematical Proceedings of the Cambridge Philosophical Society.

[69]  Aristidis K. Nikoloulopoulos On the estimation of normal copula discrete regression models using the continuous extension and simulated likelihood , 2013, 1304.0905.

[70]  B. Ravikumar,et al.  Have Labor Costs Slowed the Recovery? , 2016 .

[71]  Larry A. Wasserman,et al.  High Dimensional Semiparametric Gaussian Copula Graphical Models. , 2012, ICML 2012.

[72]  E. Luciano,et al.  Copula Methods in Finance: Cherubini/Copula , 2004 .

[73]  M. Sklar Fonctions de repartition a n dimensions et leurs marges , 1959 .

[74]  Nir Friedman,et al.  Probabilistic Graphical Models - Principles and Techniques , 2009 .

[75]  A. McNeil,et al.  The t Copula and Related Copulas , 2005 .

[76]  A. Genz,et al.  Computation of Multivariate Normal and t Probabilities , 2009 .

[77]  C. D. Kemp,et al.  On computer sampling from trivariate and multivariate discrete distributions , 1983 .

[78]  A. Krishnamoorthy,et al.  MULTIVARIATE BINOMIAL AND POISSON DISTRIBUTIONS , 2016 .

[79]  Michael I. Jordan,et al.  Graphical Models, Exponential Families, and Variational Inference , 2008, Found. Trends Mach. Learn..

[80]  Pradeep Ravikumar,et al.  Admixture of Poisson MRFs: A Topic Model with Word Dependencies , 2014, ICML.

[81]  Michael I. Jordan Graphical Models , 1998 .

[82]  P. Holgate Estimation for the bivariate Poisson distribution , 1964 .

[83]  O. Vorobyev,et al.  Discrete multivariate distributions , 2008, 0811.0406.

[84]  Aristidis K. Nikoloulopoulos Efficient estimation of high-dimensional multivariate normal copula models with discrete spatial responses , 2014, Stochastic Environmental Research and Risk Assessment.

[85]  E. Luciano,et al.  Copula methods in finance , 2004 .

[86]  Pradeep Ravikumar,et al.  Conditional Random Fields via Univariate Exponential Families , 2013, NIPS.