Statistical Data Analysis and Modeling

The availability of large structured datasets has prompted the need for efficient data analysis and modeling techniques. In systems biology, data-driven modeling approaches create models of complex cellular systems without making assumptions about the underlying mechanisms. In this chapter, we will discuss eigenvalue-based approaches, which identify important characteristics (information) of big datasets through decomposition and dimensionality reduction. We intend to address singular value decomposition (SVD), principle component analysis (PCA), and partial least squares regression (PLSR) approaches for data-driven modeling. In multi-linear systems (that share characteristics such as time points, measurements, etc.), tensor decomposition becomes particularly important for understanding higher-order datasets. Therefore, we will also discuss how to scale up these methods to tensor decomposition using an example dealing with host-cell responses to viral infection.

[1]  D. Lauffenburger,et al.  Modeling a Snap-Action, Variable-Delay Switch Controlling Extrinsic Cell Death , 2008, PLoS biology.

[2]  J. B. Binckes Satellite Reliability Estimation: Past and Present Procedures , 1983 .

[3]  Orly Alter,et al.  Genomic signal processing: from matrix algebra to genetic networks. , 2007, Methods in molecular biology.

[4]  D. Lauffenburger,et al.  A Compendium of Signals and Responses Triggered by Prodeath and Prosurvival Cytokines*S , 2005, Molecular & Cellular Proteomics.

[5]  D. Lauffenburger,et al.  Discrete logic modelling as a means to link protein signalling networks with functional analysis of mammalian signal transduction , 2009, Molecular systems biology.

[6]  Christopher R. Myers,et al.  Universally Sloppy Parameter Sensitivities in Systems Biology Models , 2007, PLoS Comput. Biol..

[7]  Rasmus Bro,et al.  The N-way Toolbox for MATLAB , 2000 .

[8]  Gaudenz Danuser,et al.  What's wrong with correlative experiments? , 2011, Nature Cell Biology.

[9]  Milena Krasich Reliability Prediction Using Flight Experience - Weibull Adjusted Probability of Survival, WAPS , 1995 .

[10]  I. Jolliffe Principal Component Analysis , 2002 .

[11]  Boris N. Kholodenko,et al.  Ligand-Specific c-Fos Expression Emerges from the Spatiotemporal Control of ErbB Network Dynamics , 2010, Cell.

[12]  E. Gilles,et al.  Computational modeling of the dynamics of the MAP kinase cascade activated by surface and internalized EGF receptors , 2002, Nature Biotechnology.

[13]  R. Tibshirani,et al.  An introduction to the bootstrap , 1993 .

[14]  John G. Albeck,et al.  Collecting and organizing systematic sets of protein data , 2006, Nature Reviews Molecular Cell Biology.

[15]  A. Krogh What are artificial neural networks? , 2008, Nature Biotechnology.

[16]  Paul Kirk,et al.  Reverse Engineering Under Uncertainty , 2016 .

[17]  B. Kowalski,et al.  Partial least-squares regression: a tutorial , 1986 .

[18]  A. Giuliani,et al.  The main biological determinants of tumor line taxonomy elucidated by a principal component analysis of microarray data , 2001, FEBS letters.

[19]  Douglas A. Lauffenburger,et al.  In Vivo Systems Analysis Identifies Spatial and Temporal Aspects of the Modulation of TNF-α–Induced Apoptosis and Proliferation by MAPKs , 2011, Science Signaling.

[20]  Mikael Sunnåker,et al.  Model Extension and Model Selection , 2016 .

[21]  Fabian J. Theis,et al.  Bayesian Model Selection Methods and Their Application to Biological ODE Systems , 2016 .

[22]  J. N. R. Jeffers,et al.  Two Case Studies in the Application of Principal Component Analysis , 1967 .

[23]  J. Whitton,et al.  Host and virus determinants of picornavirus pathogenesis and tropism , 2005, Nature Reviews Microbiology.

[24]  R. Tibshirani,et al.  Sparse Principal Component Analysis , 2006 .

[25]  D. Lauffenburger,et al.  Computational modeling of the EGF-receptor system: a paradigm for systems biology. , 2003, Trends in cell biology.

[26]  R. Lahesmaa,et al.  Capturing cell-fate decisions from the molecular signatures of a receptor-dependent signaling response , 2007, Molecular systems biology.

[27]  G. MacBeath,et al.  Cross-talk between Receptor Tyrosine Kinase and Tumor Necrosis Factor-α Signaling Networks Regulates Apoptosis but not Proliferation* , 2012, Molecular & Cellular Proteomics.

[28]  Karin J Jensen,et al.  An ERK-p38 subnetwork coordinates host cell apoptosis and necrosis during coxsackievirus B3 infection. , 2013, Cell host & microbe.

[29]  P. Sorger,et al.  Sequential Application of Anticancer Drugs Enhances Cell Death by Rewiring Apoptotic Signaling Networks , 2012, Cell.

[30]  J. Lawless Statistical Models and Methods for Lifetime Data , 2002 .

[31]  Melissa L. Kemp,et al.  Quantitative Network Signal Combinations Downstream of TCR Activation Can Predict IL-2 Production Response1 , 2007, The Journal of Immunology.

[32]  Kevin A Janes,et al.  A biological approach to computational models of proteomic networks. , 2006, Current opinion in chemical biology.

[33]  William Stafford Noble,et al.  Support vector machine , 2013 .

[34]  Michael Ruogu Zhang,et al.  Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. , 1998, Molecular biology of the cell.

[35]  D. Lauffenburger,et al.  Combined experimental and computational analysis of DNA damage signaling reveals context-dependent roles for Erk in apoptosis and G1/S arrest after genotoxic stress , 2012, Molecular systems biology.

[36]  Julio Saez-Rodriguez,et al.  Fuzzy Logic Analysis of Kinase Pathway Crosstalk in TNF/EGF/Insulin-Induced Signaling , 2007, PLoS Comput. Biol..

[37]  Karl Pearson F.R.S. LIII. On lines and planes of closest fit to systems of points in space , 1901 .

[38]  W. Nelson Statistical Methods for Reliability Data , 1998 .

[39]  D. Lauffenburger,et al.  Input–output behavior of ErbB signaling pathways as revealed by a mass action model trained against dynamic data , 2009, Molecular systems biology.

[40]  Rania Hassan,et al.  Spacecraft Reliability-Based Design Optimization Under Uncertainty Including Discrete Variables , 2008 .

[41]  R. Bro PARAFAC. Tutorial and applications , 1997 .

[42]  R. Bro,et al.  A new efficient method for determining the number of components in PARAFAC models , 2003 .

[43]  J.-F. Castet,et al.  Geosynchronous communication satellite reliability: statistical data analysis and modeling , 2009 .

[44]  D. Vitkup,et al.  Heterogeneity of tumor-induced gene expression changes in the human metabolic network , 2013, Nature Biotechnology.

[45]  C. Jayaprakash,et al.  Dramatic reduction of dimensionality in large biochemical networks owing to strong pair correlations , 2012, Journal of The Royal Society Interface.

[46]  Douglas A Lauffenburger,et al.  RAS mutations affect tumor necrosis factor-induced apoptosis in colon carcinoma cells via ERK-modulatory negative and positive feedback circuits along with non-ERK pathway effects. , 2009, Cancer research.

[47]  P. Kreeger Using Partial Least Squares Regression to Analyze Cellular Response Data , 2013, Science Signaling.

[48]  Douglas A. Lauffenburger,et al.  Common effector processing mediates cell-specific responses to stimuli , 2007, Nature.

[49]  A. Hoffmann,et al.  The I (cid:1) B –NF-(cid:1) B Signaling Module: Temporal Control and Selective Gene Activation , 2022 .

[50]  L. Tucker,et al.  Some mathematical notes on three-mode factor analysis , 1966, Psychometrika.

[51]  Michael B. Yaffe,et al.  Data-driven modelling of signal-transduction networks , 2006, Nature Reviews Molecular Cell Biology.

[52]  J. I. Ansell,et al.  Practical Methods for Reliability Data Analysis , 1994 .

[53]  D. Lauffenburger,et al.  A Systems Model of Signaling Identifies a Molecular Basis Set for Cytokine-Induced Apoptosis , 2005, Science.

[54]  J. Chang,et al.  Analysis of individual differences in multidimensional scaling via an n-way generalization of “Eckart-Young” decomposition , 1970 .

[55]  William A. Schmitt,et al.  Interactive exploration of microarray gene expression patterns in a reduced dimensional space. , 2002, Genome research.

[56]  E. Kaplan,et al.  Nonparametric Estimation from Incomplete Observations , 1958 .

[57]  D. Botstein,et al.  Singular value decomposition for genome-wide expression data processing and modeling. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[58]  Kevin A Janes,et al.  Models of signalling networks – what cell biologists can gain from them and give to them , 2013, Journal of Cell Science.

[59]  Karin J Jensen,et al.  Modeling the latent dimensions of multivariate signaling datasets , 2012, Physical biology.

[60]  Joseph H. Saleh,et al.  Satellite and satellite subsystems reliability: Statistical data analysis and modeling , 2009, Reliab. Eng. Syst. Saf..

[61]  B. McManus,et al.  Molecular biology and pathogenesis of viral myocarditis. , 2008, Annual review of pathology.

[62]  Forest M. White,et al.  Modeling HER2 Effects on Cell Behavior from Mass Spectrometry Phosphotyrosine Data , 2006, PLoS Comput. Biol..