The Poisson-lognormal model as a versatile framework for the joint analysis of species abundances

Joint Species Abundance Models (JSDM) provide a general multivariate framework to study the joint abundances of all species from a community. JSDM account for both structuring factors (environmental characteristics or gradients, such as habitat type or nutrient availability) and potential interactions between the species (competition, mutualism, parasitism, etc.), which is instrumental in disentangling meaningful ecological interactions from mere statistical associations. Modeling the dependency between the species is challenging because of the count-valued nature of abundance data and most JSDM rely on Gaussian latent layer to encode the dependencies between species in a covariance matrix. The multivariate Poisson-lognormal (PLN) model is one such model, which can be viewed as a multivariate mixed Poisson regression model. The inference of such models raises both statistical and computational issues, many of which were solved in recent contributions using variational techniques and convex optimization. The PLN model turns out to be a versatile framework, within which a variety of analyses can be performed, including multivariate sample comparison, clustering of sites or samples, dimension reduction (ordination) for visualization purposes, or inference of interaction networks. This paper presents the general PLN framework and illustrates its use on a series a typical experimental datasets. All the models and methods are implemented in the R package PLNmodels, available from cran.r-project.org.

[1]  Gérard Govaert,et al.  Assessing a Mixture Model for Clustering with the Integrated Completed Likelihood , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[2]  Francis K. C. Hui,et al.  So Many Variables: Joint Modeling in Community Ecology. , 2015, Trends in ecology & evolution.

[3]  Jan Lepš,et al.  Multivariate Analysis of Ecological Data , 2006 .

[4]  Rina Foygel,et al.  Extended Bayesian Information Criteria for Gaussian Graphical Models , 2010, NIPS.

[5]  David M. Blei,et al.  Variational Inference: A Review for Statisticians , 2016, ArXiv.

[6]  J. Elith,et al.  Species Distribution Models: Ecological Explanation and Prediction Across Space and Time , 2009 .

[7]  Stéphane Robin,et al.  Variational inference for probabilistic Poisson PCA , 2017, The Annals of Applied Statistics.

[8]  Adrian E. Raftery,et al.  MCLUST: Software for Model-Based Cluster Analysis , 1999 .

[9]  N. Reid,et al.  AN OVERVIEW OF COMPOSITE LIKELIHOOD METHODS , 2011 .

[10]  Paul J. McMurdie,et al.  DADA2: High resolution sample inference from Illumina amplicon data , 2016, Nature Methods.

[11]  Larry A. Wasserman,et al.  Stability Approach to Regularization Selection (StARS) for High Dimensional Graphical Models , 2010, NIPS.

[12]  Michaela Aschan,et al.  Fish assemblages in the Barents Sea , 2006 .

[13]  Bo Wang,et al.  Inadequacy of interval estimates corresponding to variational Bayesian approximations , 2005, AISTATS.

[14]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[15]  Michael Greenacre Fuzzy coding in constrained ordinations. , 2013, Ecology.

[16]  Francis K. C. Hui,et al.  A general algorithm for covariance modeling of discrete data , 2018, J. Multivar. Anal..

[17]  David J. Harris Generating realistic assemblages with a joint species distribution model , 2015 .

[18]  R. Tibshirani,et al.  Sparse inverse covariance estimation with the graphical lasso. , 2008, Biostatistics.

[19]  James Bailey,et al.  Information Theoretic Measures for Clusterings Comparison: Variants, Properties, Normalization and Correction for Chance , 2010, J. Mach. Learn. Res..

[20]  Loïc Schwaller,et al.  Deciphering the Pathobiome: Intra- and Interkingdom Interactions Involving the Pathogen Erysiphe alphitoides , 2016, Microbial Ecology.

[21]  Stéphane Robin,et al.  Variational Inference for sparse network reconstruction from count data , 2018, ICML.

[22]  Yogita S. Wagh,et al.  Zero-inflated models and estimation in zero-inflated Poisson distribution , 2018, Commun. Stat. Simul. Comput..

[23]  Michael I. Jordan Graphical Models , 2003 .

[24]  Tyler H. McCormick,et al.  Beyond Prediction: A Framework for Inference With Variational Approximations in Mixture Models , 2015, Journal of computational and graphical statistics : a joint publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America.

[25]  Michael E. Tipping,et al.  Probabilistic Principal Component Analysis , 1999 .

[26]  J. Aitchison,et al.  The multivariate Poisson-log normal distribution , 1989 .

[27]  Otso Ovaskainen,et al.  Modeling species co-occurrence by multivariate logistic regression generates new hypotheses on fungal interactions. , 2010, Ecology.

[28]  M. Opper,et al.  Advanced mean field methods: theory and practice , 2001 .

[29]  Alexandre d'Aspremont,et al.  Model Selection Through Sparse Max Likelihood Estimation Model Selection Through Sparse Maximum Likelihood Estimation for Multivariate Gaussian or Binary Data , 2022 .

[30]  Francis K. C. Hui,et al.  Untangling direct species associations from indirect mediator species effects with graphical models , 2019, Methods in Ecology and Evolution.

[31]  Pradeep Ravikumar,et al.  A review of multivariate distributions for count data derived from the Poisson distribution , 2016, Wiley interdisciplinary reviews. Computational statistics.