Bayesian methods in bioinformatics and computational systems biology

Bayesian methods are valuable, inter alia, whenever there is a need to extract information from data that are uncertain or subject to any kind of error or noise (including measurement error and experimental error, as well as noise or random variation intrinsic to the process of interest). Bayesian methods offer a number of advantages over more conventional statistical techniques that make them particularly appropriate for complex data. It is therefore no surprise that Bayesian methods are becoming more widely used in the fields of genetics, genomics, bioinformatics and computational systems biology, where making sense of complex noisy data is the norm. This review provides an introduction to the growing literature in this area, with particular emphasis on recent developments in Bayesian bioinformatics relevant to computational systems biology.

[1]  R. Fildes Journal of the American Statistical Association : William S. Cleveland, Marylyn E. McGill and Robert McGill, The shape parameter for a two variable graph 83 (1988) 289-300 , 1989 .

[2]  Alexander J. Hartemink,et al.  Informative priors based on transcription factor structural class improve de novo motif discovery , 2006, ISMB.

[3]  D. Edwards,et al.  Statistical Analysis of Gene Expression Microarray Data , 2003 .

[4]  S. Richardson,et al.  Bayesian Modeling of Differential Gene Expression , 2006, Biometrics.

[5]  Scott C. Schmidler,et al.  Fast Bayesian Shape Matching Using Geometric Algorithms , 2006 .

[6]  K. S. Brown,et al.  Statistical mechanical approaches to models with many poorly known parameters. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[7]  R. Gottardo,et al.  Statistical analysis of microarray data: a Bayesian approach. , 2003, Biostatistics.

[8]  J. Rosenthal,et al.  Markov Chain Monte Carlo , 2018 .

[9]  Darren J. Wilkinson,et al.  Bayesian inference for a discretely observed stochastic kinetic model , 2008, Stat. Comput..

[10]  Leanna House,et al.  Bayesian Inference for Gene Expression and Proteomics: Nonparametric Models for Proteomic Peak Identification and Quantification , 2006 .

[11]  Ernst Wit,et al.  Statistics for Microarrays : Design, Analysis and Inference , 2004 .

[12]  R. Spang,et al.  Predicting the clinical status of human breast cancer by using gene expression profiles , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[13]  A. W. Kemp,et al.  Kendall's Advanced Theory of Statistics. , 1994 .

[14]  N. Metropolis,et al.  Equation of State Calculations by Fast Computing Machines , 1953, Resonance.

[15]  D. Wilkinson,et al.  Bayesian Inference for Stochastic Kinetic Models Using a Diffusion Approximation , 2005, Biometrics.

[16]  Gaudenz Danuser,et al.  Linking data to models: data regression , 2006, Nature Reviews Molecular Cell Biology.

[17]  A. OHagan,et al.  Bayesian analysis of computer code outputs: A tutorial , 2006, Reliab. Eng. Syst. Saf..

[18]  D. Hand,et al.  Bayesian coclustering of Anopheles gene expression time series: study of immune defense response to multiple experimental challenges. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[19]  Chuan Zhou,et al.  Modelling Gene Expression Data over Time: Curve Clustering with Informative Prior Distributions , 2003 .

[20]  Jun S. Liu,et al.  Markovian structures in biological sequence alignments , 1999 .

[21]  J. Rougier,et al.  Bayes Linear Calibrated Prediction for Complex Systems , 2006 .

[22]  Donald Geman,et al.  Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images , 1984 .

[23]  T. Kalbfleisch,et al.  A stochastic model of gene transcription: an application to L1 retrotransposition events. , 2006, Journal of theoretical biology.

[24]  Anne-Mette K. Hein,et al.  BGX: a fully Bayesian integrated approach to the analysis of Affymetrix GeneChip data. , 2005, Biostatistics.

[25]  Adrian E. Raftery,et al.  Model-based clustering and data transformations for gene expression data , 2001, Bioinform..

[26]  Halima Bensmail,et al.  A novel approach for clustering proteomics data using Bayesian fast Fourier transform , 2005, Bioinform..

[27]  B. Chait,et al.  ProFound: an expert system for protein identification using mass spectrometric peptide mapping information. , 2000, Analytical chemistry.

[28]  Shailesh V. Date,et al.  A Probabilistic Functional Network of Yeast Genes , 2004, Science.

[29]  Darren J. Wilkinson,et al.  Parallel Bayesian Computation , 2005 .

[30]  Martin A. Nowak,et al.  Inferring Cellular Networks Using Probabilistic Graphical Models , 2004 .

[31]  Colin C. Pritchard,et al.  Bayesian integrated functional analysis of microarray data , 2004, Bioinform..

[32]  Michael A. West,et al.  Archival Version including Appendicies : Experiments in Stochastic Computation for High-Dimensional Graphical Models , 2005 .

[33]  Kanti V. Mardia,et al.  Bayesian refinement of protein functional site matching , 2007, BMC Bioinformatics.

[34]  Darren J. Wilkinson,et al.  Detecting homogeneous segments in DNA sequences by using hidden Markov models , 2000 .

[35]  E. Klipp,et al.  Biochemical networks with uncertain parameters. , 2005, Systems biology.

[36]  Ruedi Aebersold,et al.  Improving mass and liquid chromatography based identification of proteins using bayesian scoring. , 2005, Journal of proteome research.

[37]  G. Churchill Stochastic models for heterogeneous DNA sequences. , 1989, Bulletin of mathematical biology.

[38]  Jun S. Liu,et al.  Bayesian Models for Multiple Local Sequence Alignment and Gibbs Sampling Strategies , 1995 .

[39]  Darren J. Wilkinson Stochastic Modelling for Systems Biology , 2006 .

[40]  M. Bishop,et al.  Maximum likelihood alignment of DNA sequences. , 1986, Journal of molecular biology.

[41]  Allister Bernard,et al.  A Probabilistic Model for Cell Cycle Distributions in Synchrony Experiments , 2007, Cell cycle.

[42]  Nebojsa Nakicenovic,et al.  Avoiding dangerous climate change , 2006 .

[43]  Douglas L. Brutlag,et al.  Bayesian Segmentation of Protein Secondary Structure , 2000, J. Comput. Biol..

[44]  Norman W. Paton,et al.  Automated tracking of gene expression in individual cells and cell compartments , 2006, Journal of The Royal Society Interface.

[45]  Marco Grzegorczyk,et al.  Comparative evaluation of reverse engineering gene regulatory networks with relevance networks, graphical gaussian models and bayesian networks , 2006, Bioinform..

[46]  J. Ellenberg,et al.  High-throughput fluorescence microscopy for systems biology , 2006, Nature Reviews Molecular Cell Biology.

[47]  M. Gerstein,et al.  A Bayesian Networks Approach for Predicting Protein-Protein Interactions from Genomic Data , 2003, Science.

[48]  M. West,et al.  Sparse graphical models for exploring gene expression data , 2004 .

[49]  Richard J Boys,et al.  A Bayesian Approach to DNA Sequence Segmentation , 2004, Biometrics.

[50]  Charles J. Geyer,et al.  Practical Markov Chain Monte Carlo , 1992 .

[51]  U. Alon,et al.  Assigning numbers to the arrows: Parameterizing a gene regulation network by using accurate expression kinetics , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[52]  H. Kitano,et al.  Computational systems biology , 2002, Nature.

[53]  Martin T. Wells,et al.  Bayesian Normalization and Identification for Differential Gene Expression Data , 2005, J. Comput. Biol..

[54]  Darren J. Wilkinson,et al.  Bayesian inference for nonlinear multivariate diffusion models observed with error , 2008, Comput. Stat. Data Anal..

[55]  Roger E Bumgarner,et al.  Bayesian Robust Inference for Differential Gene Expression in Microarrays with Multiple Samples , 2004, Biometrics.

[56]  Michal Linial,et al.  Using Bayesian Networks to Analyze Expression Data , 2000, J. Comput. Biol..

[57]  Michael I. Jordan Graphical Models , 2003 .

[58]  Korbinian Strimmer,et al.  Learning causal networks from systems biology time course data: an effective model selection procedure for the vector autoregressive process , 2007, BMC Bioinformatics.

[59]  Jun S. Liu,et al.  Bayesian inference on biopolymer models , 1999, Bioinform..

[60]  P. Green,et al.  Decomposable graphical Gaussian model determination , 1999 .

[61]  Qing Zhou,et al.  Modeling within-motif dependence for transcription factor binding site predictions , 2004, Bioinform..

[62]  L. Tierney Markov Chains for Exploring Posterior Distributions , 1994 .

[63]  B. Rannala,et al.  The Bayesian revolution in genetics , 2004, Nature Reviews Genetics.

[64]  J Timmer,et al.  Parameter estimation in stochastic biochemical reactions. , 2006, Systems biology.

[65]  Carmen G. Moles,et al.  Parameter estimation in biochemical pathways: a comparison of global optimization methods. , 2003, Genome research.

[66]  M. Barenco,et al.  Ranked prediction of p53 targets using hidden variable dynamic modeling , 2006, Genome Biology.

[67]  Ping Ma,et al.  Bayesian Inference for Gene Expression and Proteomics , 2007, Briefings Bioinform..

[68]  Dirk Husmeier,et al.  Sensitivity and specificity of inferring genetic regulatory interactions from microarray experiments with dynamic Bayesian networks , 2003, Bioinform..

[69]  Neil J. Gordon,et al.  Editors: Sequential Monte Carlo Methods in Practice , 2001 .

[70]  A. Arkin,et al.  Stochastic kinetic analysis of developmental pathway bifurcation in phage lambda-infected Escherichia coli cells. , 1998, Genetics.

[71]  J. Ibrahim,et al.  Bayesian Models for Gene Expression With DNA Microarray Data , 2002 .

[72]  A. Arkin,et al.  Stochastic mechanisms in gene expression. , 1997, Proceedings of the National Academy of Sciences of the United States of America.

[73]  István Miklós,et al.  Bayesian coestimation of phylogeny and sequence alignment , 2005, BMC Bioinformatics.

[74]  Stephen P. Brooks,et al.  Markov chain Monte Carlo method and its application , 1998 .

[75]  Anshu Saksena,et al.  Bayesian model selection for mining mass spectrometry data , 2005, Neural Networks.

[76]  Paul P. Wang,et al.  Advances to Bayesian network inference for generating causal networks from observational biological data , 2004, Bioinform..

[77]  K. Sachs,et al.  Causal Protein-Signaling Networks Derived from Multiparameter Single-Cell Data , 2005, Science.

[78]  M. West,et al.  Embracing the complexity of genomic data for personalized medicine. , 2006, Genome research.

[79]  Nando de Freitas,et al.  Sequential Monte Carlo Methods in Practice , 2001, Statistics for Engineering and Information Science.

[80]  G. Casella,et al.  Explaining the Gibbs Sampler , 1992 .

[81]  Korbinian Strimmer,et al.  An empirical Bayes approach to inferring large-scale gene association networks , 2005, Bioinform..

[82]  Marit Holden,et al.  Genome-wide estimation of transcript concentrations from spotted cDNA microarray data , 2005, Nucleic acids research.

[83]  Peter J. Green,et al.  Bayesian alignment using hierarchical models, with applications in protein bioinformatics , 2005 .

[84]  W. Gilks Markov Chain Monte Carlo , 2005 .

[85]  J. Besag,et al.  Probabilistic segmentation and intensity estimation for microarray images. , 2006, Biostatistics.

[86]  Jeffrey S. Morris,et al.  Analysis of Mass Spectrometry Data Using Bayesian Wavelet-Based Functional Mixed Models , 2006 .

[87]  Erricos John Kontoghiorghes,et al.  Handbook of Parallel Computing and Statistics , 2005 .

[88]  Lorenz Wernisch,et al.  Reconstruction of gene networks using Bayesian learning and manipulation experiments , 2004, Bioinform..

[89]  Rudiyanto Gunawan,et al.  Iterative approach to model identification of biological networks , 2005, BMC Bioinformatics.

[90]  Alexander J. Hartemink,et al.  Informative Structure Priors: Joint Learning of Dynamic Regulatory Networks from Multiple Types of Data , 2004, Pacific Symposium on Biocomputing.

[91]  J. Q. Smith,et al.  1. Bayesian Statistics 4 , 1993 .

[92]  Orli G. Bahcall Single cell resolution in regulation of gene expression , 2005, Molecular systems biology.

[93]  P. Green Reversible jump Markov chain Monte Carlo computation and Bayesian model determination , 1995 .

[94]  Matthew West,et al.  Bayesian factor regression models in the''large p , 2003 .

[95]  A. Arkin,et al.  It's a noisy business! Genetic regulation at the nanomolar scale. , 1999, Trends in genetics : TIG.

[96]  Russ B. Altman,et al.  Pacific Symposium on Biocomputing '99, World Scientific, New Jersey; Pacific Symposium on Biocomputing '00, World Scientific, New Jersey; Pacific Symposium on Biocomputing '01, World Scientific, New Jersey , 2003 .

[97]  David B. Dunson,et al.  Bayesian Data Analysis , 2010 .

[98]  Jun S. Liu,et al.  Bayesian models for pooling microarray studies with multiple sources of replications , 2006, BMC Bioinformatics.

[99]  D. Gillespie The chemical Langevin equation , 2000 .

[100]  Nir Friedman,et al.  Inferring quantitative models of regulatory networks from expression data , 2004, ISMB/ECCB.

[101]  Marit Holden,et al.  Bayesian process-based modeling of two-channel microarray experiments: estimating absolute mRNA concentrations , 2006 .

[102]  P. Müller,et al.  Bayesian inference for gene expression and proteomics , 2006 .

[103]  A. O'Hagan,et al.  Bayesian calibration of computer models , 2001 .

[104]  W. K. Hastings,et al.  Monte Carlo Sampling Methods Using Markov Chains and Their Applications , 1970 .

[105]  Sean R. Eddy,et al.  Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids , 1998 .

[106]  Jun S. Liu,et al.  Bayesian Methods in Biological Sequence Analysis , 2004 .

[107]  Darren J. Wilkinson,et al.  Bayesian sequential inference for nonlinear multivariate diffusions , 2006, Stat. Comput..

[108]  Robin K. S. Hankin,et al.  Towards the probability of rapid climate change , 2006 .

[109]  Darren J. Wilkinson,et al.  Bayesian Sequential Inference for Stochastic Kinetic Biochemical Network Models , 2006, J. Comput. Biol..