Zero‐inflated Poisson factor model with application to microbiome read counts

Dimension reduction of high‐dimensional microbiome data facilitates subsequent analysis such as regression and clustering. Most existing reduction methods cannot fully accommodate the special features of the data such as count‐valued and excessive zero reads. We propose a zero‐inflated Poisson factor analysis model in this paper. The model assumes that microbiome read counts follow zero‐inflated Poisson distributions with library size as offset and Poisson rates negatively related to the inflated zero occurrences. The latent parameters of the model form a low‐rank matrix consisting of interpretable loadings and low‐dimensional scores that can be used for further analyses. We develop an efficient and robust expectation‐maximization algorithm for parameter estimation. We demonstrate the efficacy of the proposed method using comprehensive simulation studies. The application to the Oral Infections, Glucose Intolerance, and Insulin Resistance Study provides valuable insights into the relation between subgingival microbiome and periodontal disease.

[1]  Jianhua Z. Huang,et al.  Exponential Family Functional data analysis via a low‐rank model , 2018, Biometrics.

[2]  G. La Torre,et al.  Periodontitis and the microbiome: a systematic review and meta-analysis. , 2018, Minerva stomatologica.

[3]  L. Lopetuso,et al.  Relationship between oral microbiota and periodontal disease: a systematic review. , 2018, European review for medical and pharmacological sciences.

[4]  Hongzhe Li,et al.  A GLM‐based latent variable ordination method for microbiome samples , 2018, Biometrics.

[5]  Anru R. Zhang,et al.  Microbial Composition Estimation from Sparse Count Data , 2017 .

[6]  M. Riggio,et al.  Black-pigmented anaerobic bacteria associated with ovine periodontitis. , 2017, Veterinary microbiology.

[7]  D. Jacobs,et al.  The subgingival microbiome, systemic inflammation and insulin resistance: The Oral Infections, Glucose Intolerance and Insulin Resistance Study , 2017, Journal of clinical periodontology.

[8]  Sudhir R. Patil,et al.  Assessment of Streptococcus mutans in healthy versus gingivitis and chronic periodontitis: A clinico-microbiological study , 2016, Contemporary clinical dentistry.

[9]  Wei Xu,et al.  Assessment and Selection of Competing Models for Zero-Inflated Microbiome Data , 2015, PloS one.

[10]  D. Jacobs,et al.  Periodontal Bacteria and Prediabetes Prevalence in ORIGINS , 2015, Journal of dental research.

[11]  Hongzhe Li Microbiome, Metagenomics, and High-Dimensional Compositional Data Analysis , 2015 .

[12]  N. Azevedo,et al.  Relationship between invasion of the periodontium by periodontal pathogens and periodontal disease: a systematic review , 2015, Virulence.

[13]  Susan P. Holmes,et al.  Waste Not , Want Not : Why Rarefying Microbiome Data is Inadmissible . October 1 , 2013 , 2013 .

[14]  Haipeng Shen,et al.  Poisson factor models with applications to non-normalized microRNA profiling , 2013, Bioinform..

[15]  P. Diaz,et al.  The subgingival microbiome in health and periodontitis and its relationship with community biomass and inflammation , 2013, The ISME Journal.

[16]  L. C. Spolidorio,et al.  Porphyromonas endodontalis in chronic periodontitis: a clinical and microbiological cross-sectional study , 2012, Journal of oral microbiology.

[17]  M. Podar,et al.  Distinct and complex bacterial profiles in human periodontitis and health revealed by 16S pyrosequencing , 2011, The ISME Journal.

[18]  S. Srivastava,et al.  A two-parameter generalized Poisson model to improve the analysis of RNA-seq data , 2010, Nucleic acids research.

[19]  J. Izard,et al.  The Human Oral Microbiome , 2010, Journal of bacteriology.

[20]  R. Knight,et al.  Microbial community profiling for human microbiome projects: Tools, techniques, and challenges. , 2009, Genome research.

[21]  Jun Lu,et al.  BMC Bioinformatics BioMed Central Methodology article Identifying differential expression in multiple SAGE libraries: an , 2005 .

[22]  M. Moeschberger,et al.  New Bacterial Species Associated with Chronic Periodontitis , 2003, Journal of dental research.

[23]  Sanjoy Dasgupta,et al.  A Generalization of Principal Components Analysis to the Exponential Family , 2001, NIPS.

[24]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[25]  S. Socransky,et al.  Microbial complexes in subgingival plaque. , 1998, Journal of clinical periodontology.

[26]  W. Wade The Role of Eubacterium Species in Periodontal Disease and Other Oral Infections , 1996 .

[27]  Diane Lambert,et al.  Zero-inflacted Poisson regression, with an application to defects in manufacturing , 1992 .

[28]  Jorge J. Moré,et al.  Computing a Trust Region Step , 1983 .

[29]  J. T. Curtis,et al.  An Ordination of the Upland Forest Communities of Southern Wisconsin , 1957 .

[30]  Jorge J. Moré,et al.  The Levenberg-Marquardt algo-rithm: Implementation and theory , 1977 .

[31]  J. J. Moré,et al.  Levenberg--Marquardt algorithm: implementation and theory , 1977 .

[32]  Ya-Xiang Yuan,et al.  Nonlinear Optimization: Trust Region Algorithms , 2022 .