Variational inference for probabilistic Poisson PCA

Many application domains such as ecology or genomics have to deal with multivariate non Gaussian observations. A typical example is the joint observation of the respective abundances of a set of species in a series of sites, aiming to understand the co-variations between these species. The Gaussian setting provides a canonical way to model such dependencies, but does not apply in general. We consider here the multivariate exponential family framework for which we introduce a generic model with multivariate Gaussian latent variables. We show that approximate maximum likelihood inference can be achieved via a variational algorithm for which gradient descent easily applies. We show that this setting enables us to account for covariates and offsets. We then focus on the case of the Poisson-lognormal model in the context of community ecology.

[1]  Katherine A. Heller,et al.  Bayesian Exponential Family PCA , 2008, NIPS.

[2]  O. SIAMJ.,et al.  A CLASS OF GLOBALLY CONVERGENT OPTIMIZATION METHODS BASED ON CONSERVATIVE CONVEX SEPARABLE APPROXIMATIONS∗ , 2002 .

[3]  James F. Nelson Multivariate Gamma-Poisson Models , 1985 .

[4]  Andrew J. Landgraf,et al.  Generalized Principal Component Analysis: Dimensionality Reduction through the Projection of Natural Parameters , 2015 .

[5]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[6]  Jun Wang,et al.  Quantitative microbiome profiling links gut community variation to microbial load , 2017, Nature.

[7]  Tom Minka,et al.  Automatic Choice of Dimensionality for PCA , 2000, NIPS.

[8]  Jean Lafond,et al.  Low Rank Matrix Completion with Exponential Family Noise , 2015, COLT.

[9]  Lydia T. Liu,et al.  $e$PCA: High dimensional exponential family PCA , 2016, The Annals of Applied Statistics.

[10]  R. Izsák,et al.  Maximum likelihood fitting of the Poisson lognormal distribution , 2008, Environmental and Ecological Statistics.

[11]  Michael I. Jordan,et al.  Graphical Models, Exponential Families, and Variational Inference , 2008, Found. Trends Mach. Learn..

[12]  Jun Chen,et al.  An omnibus test for differential distribution analysis of microbiome sequencing data , 2018, Bioinform..

[13]  Jorge Nocedal,et al.  A Limited Memory Algorithm for Bound Constrained Optimization , 1995, SIAM J. Sci. Comput..

[14]  Gérard Govaert,et al.  Assessing a Mixture Model for Clustering with the Integrated Completed Likelihood , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[15]  Loïc Schwaller,et al.  Deciphering the Pathobiome: Intra- and Interkingdom Interactions Involving the Pathogen Erysiphe alphitoides , 2016, Microbial Ecology.

[16]  Yang Cao,et al.  Poisson matrix completion , 2015, 2015 IEEE International Symposium on Information Theory (ISIT).

[17]  M. Bradford,et al.  A method for simultaneous measurement of soil bacterial abundances and community composition via 16S rRNA gene sequencing , 2016 .

[18]  Hadley Wickham,et al.  ggplot2 - Elegant Graphics for Data Analysis (2nd Edition) , 2017 .

[19]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[20]  J. Estellé,et al.  Early-life establishment of the swine gut microbiome and impact on host phenotypes. , 2015, Environmental microbiology reports.

[21]  Nicole A. Lazar,et al.  Statistical Analysis With Missing Data , 2003, Technometrics.

[22]  Anja Vogler,et al.  An Introduction to Multivariate Statistical Analysis , 2004 .

[23]  Mingyuan Zhou,et al.  Nonparametric Bayesian Negative Binomial Factor Analysis , 2016, Bayesian Analysis.

[24]  David B. Dunson,et al.  Beta-Negative Binomial Process and Poisson Factor Analysis , 2011, AISTATS.

[25]  Jean M. Macklaim,et al.  Microbiome Datasets Are Compositional: And This Is Not Optional , 2017, Front. Microbiol..

[26]  R. Tibshirani,et al.  A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis. , 2009, Biostatistics.

[27]  Matthew C. B. Tsilimigras,et al.  Compositional data analysis of the microbiome: fundamentals, tools, and challenges. , 2016, Annals of epidemiology.

[28]  J. Aitchison,et al.  The multivariate Poisson-log normal distribution , 1989 .

[29]  S. Srivastava,et al.  A two-parameter generalized Poisson model to improve the analysis of RNA-seq data , 2010, Nucleic acids research.

[30]  D. Karlis EM Algorithm for Mixed Poisson and Other Discrete Distributions , 2005, ASTIN Bulletin.

[31]  H. Sebastian Seung,et al.  Algorithms for Non-negative Matrix Factorization , 2000, NIPS.

[32]  Yoonkyung Lee,et al.  Dimensionality reduction for binary data through the projection of natural parameters , 2015, J. Multivar. Anal..

[33]  J. Andrew Royle,et al.  Efficient statistical mapping of avian count data , 2005, Environmental and Ecological Statistics.

[34]  Rebecca Willett,et al.  Poisson Noise Reduction with Non-local PCA , 2012, Journal of Mathematical Imaging and Vision.

[35]  C. Eckart,et al.  The approximation of one matrix by another of lower rank , 1936 .

[36]  Michael I. Jordan,et al.  Bayesian parameter estimation via variational methods , 2000, Stat. Comput..

[37]  M. Wand,et al.  Theory of Gaussian variational approximation for a Poisson mixed model , 2011 .

[38]  Dacheng Tao,et al.  Simple Exponential Family PCA , 2010, IEEE Transactions on Neural Networks and Learning Systems.

[39]  Michael E. Tipping,et al.  Probabilistic Principal Component Analysis , 1999 .

[40]  Sanjoy Dasgupta,et al.  A Generalization of Principal Components Analysis to the Exponential Family , 2001, NIPS.