Visualization and Curve-Parameter Estimation Strategies for Efficient Exploration of Phenotype Microarray Kinetics

Background The Phenotype MicroArray (OmniLog® PM) system is able to simultaneously capture a large number of phenotypes by recording an organism's respiration over time on distinct substrates. This technique targets the object of natural selection itself, the phenotype, whereas previously addressed ‘-omics’ techniques merely study components that finally contribute to it. The recording of respiration over time, however, adds a longitudinal dimension to the data. To optimally exploit this information, it must be extracted from the shapes of the recorded curves and displayed in analogy to conventional growth curves. Methodology The free software environment R was explored for both visualizing and fitting of PM respiration curves. Approaches using either a model fit (and commonly applied growth models) or a smoothing spline were evaluated. Their reliability in inferring curve parameters and confidence intervals was compared to the native OmniLog® PM analysis software. We consider the post-processing of the estimated parameters, the optimal classification of curve shapes and the detection of significant differences between them, as well as practically relevant questions such as detecting the impact of cultivation times and the minimum required number of experimental repeats. Conclusions We provide a comprehensive framework for data visualization and parameter estimation according to user choices. A flexible graphical representation strategy for displaying the results is proposed, including 95% confidence intervals for the estimated parameters. The spline approach is less prone to irregular curve shapes than fitting any of the considered models or using the native PM software for calculating both point estimates and confidence intervals. These can serve as a starting point for the automated post-processing of PM data, providing much more information than the strict dichotomization into positive and negative reactions. Our results form the basis for a freely available R package for the analysis of PM data.

[1]  Xueyang Feng,et al.  Bridging the Gap between Fluxomics and Industrial Biotechnology , 2011, Journal of biomedicine & biotechnology.

[2]  Peter Craven,et al.  Smoothing noisy data with spline functions , 1978 .

[3]  Colin P.D. Birch,et al.  A New Generalized Logistic Sigmoid Growth Equation Compared with the Richards Growth Equation , 1999 .

[4]  Jotun Hein,et al.  Comparative analysis of metabolic networks provides insight into the evolution of plant pathogenic and nonpathogenic lifestyles in Pseudomonas. , 2011, Molecular biology and evolution.

[5]  Gunnar Rätsch,et al.  Support Vector Machines and Kernels for Computational Biology , 2008, PLoS Comput. Biol..

[6]  H. Mori,et al.  The applications of systematic in-frame, single-gene knockout mutant collection of Escherichia coli K-12. , 2008, Methods in molecular biology.

[7]  P. Bloomfield,et al.  Spline Functions in Data Analysis. , 1974 .

[8]  W. Härdle Applied Nonparametric Regression , 1992 .

[9]  E. Nevo,et al.  Patterns of thermal adaptation of Bacillus simplex to the microclimatically contrasting slopes of 'Evolution Canyons' I and II, Israel. , 2007, Environmental microbiology.

[10]  W. Härdle,et al.  Applied Nonparametric Regression , 1991 .

[11]  Janet S. Jacobsen,et al.  Visualization of Growth Curve Data from Phenotype Microarray Experiments , 2007, 2007 11th International Conference Information Visualization (IV '07).

[12]  Craig K. Enders,et al.  Using the SPSS Mixed Procedure to Fit Cross-Sectional and Longitudinal Multilevel Models , 2005 .

[13]  Bill Shipley,et al.  Cause and Correlation in Biology: A User''s Guide to Path Analysis , 2016 .

[14]  Gabor Grothendieck,et al.  Lattice: Multivariate Data Visualization with R , 2008 .

[15]  B. Palsson,et al.  Genome-scale Reconstruction of Metabolic Network in Bacillus subtilis Based on High-throughput Phenotyping and Gene Essentiality Data* , 2007, Journal of Biological Chemistry.

[16]  Eric R. Ziegel,et al.  A Handbook of Statistical Analysis Using R , 1997, Technometrics.

[17]  Jörg Müller,et al.  Monotonicity-constrained species distribution models. , 2011, Ecology.

[18]  Bernhard O Palsson,et al.  Hierarchical thinking in network biology: the unbiased modularization of biochemical networks. , 2004, Trends in biochemical sciences.

[19]  M A Savageau,et al.  Generalized indicator plate for genetic, metabolic, and taxonomic studies with microorganisms , 1977, Applied and environmental microbiology.

[20]  Deepayan Sarkar,et al.  Lattice: Multivariate Data Visualization with R , 2008 .

[21]  D. Pieper,et al.  Metabolic networks, microbial ecology and 'omics' technologies: towards understanding in situ biodegradation processes. , 2010, Environmental microbiology.

[22]  B. Bochner,et al.  Phenotype microarrays for high-throughput phenotypic testing and assay of gene function. , 2001, Genome research.

[23]  R. Di Cagno,et al.  Comparison of phenotypic (Biolog System) and genotypic (random amplified polymorphic DNA-polymerase chain reaction, RAPD-PCR, and amplified fragment length polymorphism, AFLP) methods for typing Lactobacillus plantarum isolates from raw vegetables and fruits. , 2010, International journal of food microbiology.

[24]  F. Rombouts,et al.  Modeling of the Bacterial Growth Curve , 1990, Applied and environmental microbiology.

[25]  Richard A. Becker,et al.  A Tour of Trellis Graphics , 1996 .

[26]  I. Fodor,et al.  Growth Curve Models for the Analysis of Phenotype Arrays for a Systems Biology Overview of Yersinia pestis , 2005 .

[27]  P. Gottschalk,et al.  The five-parameter logistic: a characterization and comparison with the four-parameter logistic. , 2005, Analytical biochemistry.

[28]  M. Kenward,et al.  Parametric modelling of growth curve data: An overview , 2001 .

[29]  Joaquín Dopazo,et al.  Babelomics: an integrative platform for the analysis of transcriptomics, proteomics and genomic data with advanced functional profiling , 2010, Nucleic Acids Res..

[30]  Garrett M. Fitzmaurice,et al.  A Primer in Longitudinal Data Analysis , 2008, Circulation.

[31]  David Hinkley,et al.  Bootstrap Methods: Another Look at the Jackknife , 2008 .

[32]  David A. Mitchell,et al.  A review of recent developments in modeling of microbial growth kinetics and intraparticle phenomena in solid-state fermentation , 2004 .

[33]  Jason A. Papin,et al.  Genome-Scale Reconstruction and Analysis of the Pseudomonas putida KT2440 Metabolic Network Facilitates Applications in Biotechnology , 2008, PLoS Comput. Biol..

[34]  Amrita K Cheema,et al.  Biomarkers in the age of omics: time for a systems biology approach. , 2011, Omics : a journal of integrative biology.

[35]  Marcus J. Claesson,et al.  Genome-scale analyses of health-promoting bacteria: probiogenomics , 2009, Nature Reviews Microbiology.

[36]  Markus R Wenk,et al.  Lipidomics: New Tools and Applications , 2010, Cell.

[37]  N. Schenker,et al.  On Judging the Significance of Differences by Examining the Overlap Between Confidence Intervals , 2001 .

[38]  Weiwen Zhang,et al.  Integrating multiple 'omics' analysis for microbial biology: application and methodologies. , 2010, Microbiology.

[39]  R. Kandpal,et al.  The era of 'omics unlimited. , 2009, BioTechniques.

[40]  J. Ware,et al.  Random-effects models for longitudinal data. , 1982, Biometrics.

[41]  J. Ludwig,et al.  grofit: Fitting Biological Growth Curves with R , 2010 .

[42]  M. Rohde,et al.  Lack of the Delta Subunit of RNA Polymerase Increases Virulence Related Traits of Streptococcus mutans , 2011, PloS one.

[43]  Bernhard O. Palsson,et al.  A genome-scale metabolic reconstruction of Pseudomonas putida KT2440: iJN746 as a cell factory , 2008, BMC Systems Biology.

[44]  M. Gardner,et al.  Confidence intervals rather than P values: estimation rather than hypothesis testing. , 1986, British medical journal.

[45]  Matthew Sperrin,et al.  Multiple Testing Procedures with Applications to Genomics , 2010 .

[46]  W. Feller On the logistic law of growth and its empirical verifications in biology , 1940 .

[47]  Jens C. Streibig,et al.  Bioassay analysis using R , 2005 .

[48]  Geert Molenberghs,et al.  Nonlinear Models for Longitudinal Data , 2009 .

[49]  J. Richtsmeier,et al.  What are genes “for” or where are traits “from”? What is the question? , 2009, BioEssays : news and reviews in molecular, cellular and developmental biology.

[50]  O. Lind,et al.  Key issues concerning biolog use for aerobic and anaerobic freshwater bacterial community-level physiological profiling , 2006 .

[51]  Manuel J. A. Eugster,et al.  From Spider-man to Hero - archetypal analysis in R , 2009 .

[52]  L. Forney,et al.  Analysis of factors affecting the accuracy, reproducibility, and interpretation of microbial community carbon source utilization patterns , 1995, Applied and environmental microbiology.

[53]  J. M. Ottino,et al.  Engineering complex systems , 2004, Nature.

[54]  G. Wahba Smoothing noisy data with spline functions , 1975 .

[55]  Keri Sarver,et al.  PheMaDB: A solution for storage, retrieval, and analysis of high throughput phenotype data , 2011, BMC Bioinformatics.

[56]  Judith D. Singer,et al.  Using SAS PROC MIXED to Fit Multilevel Models, Hierarchical Models, and Individual Growth Models , 1998 .

[57]  Gary C. White,et al.  A New Paradigm for the Analysis and Interpretation of Growth Data: The Shape of Things to Come , 1987 .

[58]  Marc-Thorsten Hütt Datenanalyse in der Biologie , 2001 .

[59]  Frank Schaarschmidt,et al.  Analysis of Trials with Complex Treatment Structure Using Multiple Contrast Tests , 2009 .

[60]  D. Gevers,et al.  Biphasic kinetics of growth and bacteriocin production with Lactobacillus amylovorus DCE 471 occur under stress conditions. , 2003, Microbiology.

[61]  W. Härdle Applied Nonparametric Regression , 1991 .

[62]  Richard A. Johnson,et al.  Applied Multivariate Statistical Analysis , 1983 .

[63]  Sophia Rabe-Hesketh,et al.  Multilevel and Longitudinal Modeling Using Stata, Second Edition , 2008 .

[64]  B. Palsson,et al.  Systems approach to refining genome annotation , 2006, Proceedings of the National Academy of Sciences.

[65]  Dan Spiegelman,et al.  A survey of the methods for the characterization of microbial consortia and communities. , 2005, Canadian journal of microbiology.

[66]  Christl A. Donnelly,et al.  Review papers : Longitudinal studies with continuous responses , 1992 .

[67]  Maik Kschischo,et al.  Grofit: Fitting biological growth curves , 2010 .

[68]  F. J. Richards A Flexible Growth Function for Empirical Use , 1959 .

[69]  D. Vaux,et al.  Error bars in experimental biology , 2007, The Journal of cell biology.

[70]  E. Mayr,et al.  The objects of selection. , 1997, Proceedings of the National Academy of Sciences of the United States of America.

[71]  B. Efron Bootstrap Methods: Another Look at the Jackknife , 1979 .

[72]  B. Griffiths,et al.  Statistical analysis of the time-course of Biolog substrate utilization , 1997 .

[73]  Steven G. Gilmour,et al.  The analysis of designed experiments and longitudinal data by using smoothing splines - Discussion , 1999 .

[74]  Ignacio González,et al.  integrOmics: an R package to unravel relationships between two omics datasets , 2009, Bioinform..

[75]  B. Wanner,et al.  Phenotype MicroArray Analysis of Escherichia coli K-12 Mutants with Deletions of All Two-Component Systems , 2003, Journal of bacteriology.

[76]  Eytan Ruppin,et al.  Metabolic reconstruction, constraint-based analysis and game theory to probe genome-scale metabolic networks. , 2010, Current opinion in biotechnology.

[77]  Jean YH Yang,et al.  Bioconductor: open software development for computational biology and bioinformatics , 2004, Genome Biology.

[78]  Qun Ma,et al.  Cryptic prophages help bacteria cope with adverse environments , 2010, Nature communications.

[79]  N. Horton Multilevel and Longitudinal Modeling Using Stata , 2006 .

[80]  Raymond J Carroll,et al.  The International Journal of Biostatistics Statistical Methods for Comparative Phenomics Using High-Throughput Phenotype Microarrays , 2011 .

[81]  S. Lindquist,et al.  Harnessing Natural Diversity to Probe Metabolic Pathways , 2005, PLoS genetics.

[82]  Anders Blomberg,et al.  Automated screening in environmental arrays allows analysis of quantitative phenotypic profiles in Saccharomyces cerevisiae , 2003, Yeast.

[83]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[84]  J. Dillon,et al.  Characterization of halophiles isolated from solar salterns in Baja California, Mexico , 2009, Extremophiles.

[85]  Paul H. C. Eilers,et al.  Flexible smoothing with B-splines and penalties , 1996 .

[86]  Charles L. Wilkins,et al.  Problems with the “omics” , 2006 .

[87]  T. Hothorn,et al.  Multiple Comparisons Using R , 2010 .

[88]  H. Akaike,et al.  Information Theory and an Extension of the Maximum Likelihood Principle , 1973 .

[89]  M. Kenward,et al.  The Analysis of Designed Experiments and Longitudinal Data by Using Smoothing Splines , 1999 .

[90]  B. Bochner Global phenotypic characterization of bacteria , 2008, FEMS microbiology reviews.

[91]  J. Ware,et al.  Applied Longitudinal Analysis , 2004 .