A Combined PLS and Negative Binomial Regression Model for Inferring Association Networks from Next-Generation Sequencing Count Data

A major challenge of genomics data is to detect interactions displaying functional associations from large-scale observations. In this study, a new cPLS-algorithm combining partial least squares approach with negative binomial regression is suggested to reconstruct a genomic association network for high-dimensional next-generation sequencing count data. The suggested approach is applicable to the raw counts data, without requiring any further pre-processing steps. In the settings investigated, the cPLS-algorithm outperformed the two widely used comparative methods, graphical lasso, and weighted correlation network analysis. In addition, cPLS is able to estimate the full network for thousands of genes without major computational load. Finally, we demonstrate that cPLS is capable of finding biologically meaningful associations by analyzing an example data set from a previously published study to examine the molecular anatomy of the craniofacial development.

[1]  N. Meinshausen,et al.  High-dimensional graphs and variable selection with the Lasso , 2006, math/0608017.

[2]  W. Huber,et al.  which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. MAnorm: a robust model for quantitative comparison of ChIP-Seq data sets , 2011 .

[3]  M. Marazita The evolution of human genetic studies of cleft lip and cleft palate. , 2012, Annual review of genomics and human genetics.

[4]  S. Potter,et al.  Molecular Anatomy of Palate Development , 2015, PloS one.

[5]  Susanna Cirera,et al.  Identification of co-expression gene networks, regulatory genes and pathways for obesity based on adipose tissue RNA Sequencing in a porcine model , 2014, BMC Medical Genomics.

[6]  Joakim Lundeberg,et al.  Generations of sequencing technologies. , 2009, Genomics.

[7]  R. Redett,et al.  Cleft Lip and Palate , 2013, Eplasty.

[8]  M. Depew,et al.  Sonic hedgehog signalling inhibits palatogenesis and arrests tooth development in a mouse model of the nevoid basal cell carcinoma syndrome , 2009, Developmental biology.

[9]  B. Efron Large-Scale Simultaneous Hypothesis Testing , 2004 .

[10]  Paul P. Wang,et al.  Advances to Bayesian network inference for generating causal networks from observational biological data , 2004, Bioinform..

[11]  Larry A. Wasserman,et al.  The huge Package for High-dimensional Undirected Graph Estimation in R , 2012, J. Mach. Learn. Res..

[12]  S. Datta,et al.  Exploring relationships in gene expressions: a partial least squares approach. , 2001, Gene expression.

[13]  Jeff H. Chang,et al.  The NBP Negative Binomial Model for Assessing Differential Gene Expression from RNA-Seq , 2011 .

[14]  B. Kopocinski Multivariate negative binomial distributions generated by multivariate exponential distributions , 1999 .

[15]  M. Stone Continuum regression: Cross-validated sequentially constructed prediction embracing ordinary least s , 1990 .

[16]  M. Ohira,et al.  Oncogenic LMO3 Collaborates with HEN2 to Enhance Neuroblastoma Cell Growth through Transactivation of Mash1 , 2011, PloS one.

[17]  Simone Fattorini,et al.  Cause and Correlation in Biology. A User's Guide to Path Analysis, Structural Equations and Causal Inference with R, Second edition, Bill Shipley. Cambridge University Press (2016), (ISBN: 978-1-107-44259-7, 314 pp., £39.99, paperback) , 2017 .

[18]  Galit Shmueli,et al.  On Generating Multivariate Poisson Data in Management Science Applications , 2009 .

[19]  Pradeep Ravikumar,et al.  On Poisson Graphical Models , 2013, NIPS.

[20]  The Cancer Genome Atlas Research Network COMPREHENSIVE MOLECULAR CHARACTERIZATION OF CLEAR CELL RENAL CELL CARCINOMA , 2013, Nature.

[21]  K. Strimmer,et al.  Statistical Applications in Genetics and Molecular Biology A Shrinkage Approach to Large-Scale Covariance Matrix Estimation and Implications for Functional Genomics , 2011 .

[22]  Nicolas Servant,et al.  A comprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysis , 2013, Briefings Bioinform..

[23]  Steven R. Head,et al.  Next-generation sequencing , 2010, Nature Reviews Drug Discovery.

[24]  Steven J. M. Jones,et al.  Comprehensive molecular characterization of clear cell renal cell carcinoma , 2013, Nature.

[25]  Bill Shipley,et al.  Cause and Correlation in Biology: A User''s Guide to Path Analysis , 2016 .

[26]  Rafael A. Irizarry,et al.  Bioinformatics and Computational Biology Solutions using R and Bioconductor , 2005 .

[27]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[28]  Sandrine Dudoit,et al.  Evaluation of statistical methods for normalization and differential expression in mRNA-Seq experiments , 2010, BMC Bioinformatics.

[29]  Dan Nettleton,et al.  SimSeq: a nonparametric approach to simulation of RNA-sequence datasets , 2015, Bioinform..

[30]  P. Shannon,et al.  Cytoscape: a software environment for integrated models of biomolecular interaction networks. , 2003, Genome research.

[31]  Antti Honkela,et al.  On the inconsistency of ℓ1-penalised sparse precision matrix estimation , 2016, BMC Bioinformatics.

[32]  John P A Ioannidis,et al.  Genetic associations: false or true? , 2003, Trends in molecular medicine.

[33]  O. Fiehn,et al.  Interpreting correlations in metabolomic networks. , 2003, Biochemical Society transactions.

[34]  Adam A. Margolin,et al.  Reverse engineering of regulatory networks in human B cells , 2005, Nature Genetics.

[35]  Charles Pound,et al.  Identification of Plasma Lipid Biomarkers for Prostate Cancer by Lipidomics and Bioinformatics , 2012, PloS one.

[36]  Vasyl Pihur,et al.  Reconstruction of genetic association networks from microarray data: a partial least squares approach , 2008, Bioinform..

[37]  J. Davis Bioinformatics and Computational Biology Solutions Using R and Bioconductor , 2007 .

[38]  Susmita Datta,et al.  A statistical framework for differential network analysis from microarray data , 2010, BMC Bioinformatics.

[39]  Harry Hochheiser,et al.  The FaceBase Consortium: a comprehensive program to facilitate craniofacial research. , 2011, Developmental biology.

[40]  R. Tibshirani,et al.  Sparse inverse covariance estimation with the graphical lasso. , 2008, Biostatistics.

[41]  M. Newton Large-Scale Simultaneous Hypothesis Testing: The Choice of a Null Hypothesis , 2008 .

[42]  M. Metzker Sequencing technologies — the next generation , 2010, Nature Reviews Genetics.

[43]  S. Horvath,et al.  A General Framework for Weighted Gene Co-Expression Network Analysis , 2005, Statistical applications in genetics and molecular biology.

[44]  S. Wold,et al.  The multivariate calibration problem in chemistry solved by the PLS method , 1983 .

[45]  Daniel Bottomly,et al.  Utilizing RNA-Seq data for de novo coexpression network inference , 2012, Bioinform..

[46]  M. Marra,et al.  Applications of next-generation sequencing technologies in functional genomics. , 2008, Genomics.

[47]  Genevera I. Allen,et al.  A Local Poisson Graphical Model for Inferring Networks From Sequencing Data , 2013, IEEE Transactions on NanoBioscience.

[48]  M. Yuan,et al.  Model selection and estimation in the Gaussian graphical model , 2007 .

[49]  Jürgen Kurths,et al.  Observing and Interpreting Correlations in Metabolic Networks , 2003, Bioinform..

[50]  T. Beaty,et al.  Cleft lip and palate: understanding genetic and environmental influences , 2011, Nature Reviews Genetics.

[51]  FOXL2 modulates cartilage, skeletal development and IGF1-dependent growth in mice , 2015, BMC Developmental Biology.

[52]  TC Cox Taking it to the max: The genetic and developmental mechanisms coordinating midfacial morphogenesis and dysmorphology , 2004, Clinical genetics.

[53]  Pradeep Ravikumar,et al.  Graphical Models via Generalized Linear Models , 2012, NIPS.

[54]  Alexander Dekhtyar,et al.  Information Retrieval , 2018, Lecture Notes in Computer Science.