Computational methods for systems biology: analysis of high-throughput measurements and modeling of genetic regulatory networks

High-throughput measurement techniques have revolutionized the field of molecular biology by gearing biological research towards approaches that involve extensive collection of experimental data and integrated analysis of biological systems on a genome-wide scale. Integration of experimental and computational approaches to understand complex biological systems— computational systems biology—has the potential to play a profound role in making life science discoveries in the future. Analysis of massive amounts of measurement data and modeling of high-dimensional biological systems inevitably require advanced computational methods in order to draw valid biological conclusions. This thesis introduces novel computational methods for the problems encountered in the field of systems biology. The content of the thesis is three-fold. The first part introduces methods for high-throughput measurement preprocessing. Two general methods for correcting systematic distortions originating from sample heterogeneity and sample asynchrony are developed. The former distortion is typically present in experiments conducted on non-homogeneous cell populations and the latter is encountered in practically all biological time series experiments. The second topic focuses on robust time series analysis. General methods for both robust spectrum estimation and robust periodicity detection are introduced. Robust computational methods are preferred because the exact statistical characteristics of high-throughput data are generally unknown and the measurements are also prone to contain other non-idealities, such as outliers and distortion from the original wave form. The third part is devoted to integrated analysis of genetic regulatory networks, or biological networks as they are also called, on a global scale. The effect of certain Post function classes on general properties of genetic

[1]  Sui Huang Gene expression profiling, genetic networks, and cellular states: an integrating concept for tumorigenesis and drug discovery , 1999, Journal of Molecular Medicine.

[2]  David M. Rocke,et al.  A Model for Measurement Error for Gene Expression Arrays , 2001, J. Comput. Biol..

[3]  David M. Rocke,et al.  Variance-stabilizing transformations for two-color microarrays , 2004, Bioinform..

[4]  László Györfi,et al.  A Probabilistic Theory of Pattern Recognition , 1996, Stochastic Modelling and Applied Probability.

[5]  Satoru Miyano,et al.  Inferring Gene Regulatory Networks from Time-Ordered Gene Expression Data of Bacillus Subtilis Using Differential Equations , 2002, Pacific Symposium on Biocomputing.

[6]  J. Wang-Rodriguez,et al.  In silico dissection of cell-type-associated patterns of gene expression in prostate cancer. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[7]  Lorenz Wernisch,et al.  Reconstruction of gene networks using Bayesian learning and manipulation experiments , 2004, Bioinform..

[8]  S. Dudoit,et al.  Normalization for cDNA microarray data: a robust composite method addressing single and multiple slide systematic variation. , 2002, Nucleic acids research.

[9]  R. Randles,et al.  Introduction to the Theory of Nonparametric Statistics , 1991 .

[10]  Alexander J. Hartemink,et al.  Informative Structure Priors: Joint Learning of Dynamic Regulatory Networks from Multiple Types of Data , 2004, Pacific Symposium on Biocomputing.

[11]  S. Dudoit,et al.  Multiple Hypothesis Testing in Microarray Experiments , 2003 .

[12]  C. Hurvich,et al.  High Breakdown Methods of Time Series Analysis , 1993 .

[13]  Jaakko Astola,et al.  Inference of Genetic Regulatory Networks via Best-Fit Extensions , 2003 .

[14]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[15]  Michael L. Bittner,et al.  Efficient selection of feature sets possessing high coefficients of determination based on incremental determinations , 2003, Signal Process..

[16]  Daphne Koller,et al.  Active Learning for Structure in Bayesian Networks , 2001, IJCAI.

[17]  Trey Ideker,et al.  Testing for Differentially-Expressed Genes by Maximum-Likelihood Analysis of Microarray Data , 2000, J. Comput. Biol..

[18]  Shean-Tsong Chiu,et al.  Detecting Periodic Components in a White Gaussian Time Series , 1989 .

[19]  V. Anne Smith,et al.  Using Bayesian Network Inference Algorithms to Recover Molecular Genetic Regulatory Networks , 2002 .

[20]  W. Cleveland Robust Locally Weighted Regression and Smoothing Scatterplots , 1979 .

[21]  W. Gilbert,et al.  A new method for sequencing DNA. , 1977, Proceedings of the National Academy of Sciences of the United States of America.

[22]  Zhaohui S. Qin,et al.  Statistical resynchronization and Bayesian detection of periodically expressed genes. , 2004, Nucleic acids research.

[23]  D. Edwards,et al.  Statistical Analysis of Gene Expression Microarray Data , 2003 .

[24]  C. Darwin On the Origin of Species by Means of Natural Selection: Or, The Preservation of Favoured Races in the Struggle for Life , 2019 .

[25]  Ka Yee Yeung,et al.  Algorithms for choosing differential gene expression experiments , 1999, RECOMB.

[26]  Carsten Peterson,et al.  Random Boolean network models and the yeast transcriptional network , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[27]  L. Hood,et al.  A Genomic Regulatory Network for Development , 2002, Science.

[28]  Kevin P. Murphy,et al.  Learning the Structure of Dynamic Probabilistic Networks , 1998, UAI.

[29]  L. Hood,et al.  The digital code of DNA , 2003, Nature.

[30]  V. Anne Smith,et al.  Evaluating functional network inference using simulations of complex biological systems , 2002, ISMB.

[31]  J. Astola,et al.  INFERENCE OF GENETIC REGULATORY NETWORKS UNDER THE BEST-FIT EXTENSION PARADIGM , 2001 .

[32]  Petre Stoica,et al.  Introduction to spectral analysis , 1997 .

[33]  Jaakko Astola,et al.  On the Use of MDL Principle in Gene Expression Prediction , 2001, EURASIP J. Adv. Signal Process..

[34]  Korbinian Strimmer,et al.  Identifying periodically expressed transcripts in microarray time series data , 2008, Bioinform..

[35]  Jaakko Astola,et al.  A novel strategy for microarray quality control using Bayesian networks , 2003, Bioinform..

[36]  Aniruddha Datta,et al.  External control in Markovian genetic regulatory networks: the imperfect information case , 2004, Bioinform..

[37]  C. Sherr Cancer Cell Cycles , 1996, Science.

[38]  Heikki Huttunen,et al.  Estimation and inversion of the effects of cell population asynchrony in gene expression time-series , 2003, Signal Process..

[39]  Ting Chen,et al.  Modeling Gene Expression with Differential Equations , 1998, Pacific Symposium on Biocomputing.

[40]  Xiaobo Zhou,et al.  A Bayesian connectivity-based approach to constructing probabilistic gene regulatory networks , 2004, Bioinform..

[41]  Y. Chen,et al.  Ratio-based decisions and the quantitative analysis of cDNA microarray images. , 1997, Journal of biomedical optics.

[42]  Alejandro Correa,et al.  Multiple oscillators regulate circadian gene expression in Neurospora , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[43]  J. V. Moran,et al.  Initial sequencing and analysis of the human genome. , 2001, Nature.

[44]  Kevin Murphy,et al.  Active Learning of Causal Bayes Net Structure , 2006 .

[45]  David M. Rocke,et al.  Approximate Variance-stabilizing Transformations for Gene-expression Microarray Data , 2003, Bioinform..

[46]  Olli Yli-Harja,et al.  Estimation of population effects in synchronized budding yeast experiments , 2003, IS&T/SPIE Electronic Imaging.

[47]  H. Hartley,et al.  Tests of significance in harmonic analysis. , 1949, Biometrika.

[48]  M. Aldana Boolean dynamics of networks with scale-free topology , 2003 .

[49]  Paul P. Wang,et al.  Advances to Bayesian network inference for generating causal networks from observational biological data , 2004, Bioinform..

[50]  W. Yung,et al.  Reactivation of insulin-like growth factor binding protein 2 expression in glioblastoma multiforme: a revelation by parallel gene expression profiling. , 1999, Cancer research.

[51]  Heikki Huttunen,et al.  Detecting Periodicity in Nonideal Datasets , 2003, SDM.

[52]  Edward R. Dougherty,et al.  Probabilistic Boolean networks: a rule-based uncertainty model for gene regulatory networks , 2002, Bioinform..

[53]  D. Thomson,et al.  Robust Estimation of Power Spectra , 1979 .

[54]  Andrew Wuensche,et al.  A model of transcriptional regulatory networks based on biases in the observed regulation rules , 2002, Complex..

[55]  Stuart J. Russell,et al.  Dynamic bayesian networks: representation, inference and learning , 2002 .

[56]  Hugues Bersini,et al.  Separation of samples into their constituents using gene expression data , 2001, ISMB.

[57]  Wei Zhang,et al.  Distinguishing key biological pathways between primary breast cancers and their lymph node metastases by gene function-based clustering analysis. , 2004, International journal of oncology.

[58]  Ronald W. Davis,et al.  A genome-wide transcriptional analysis of the mitotic cell cycle. , 1998, Molecular cell.

[59]  Robert Tibshirani,et al.  An Introduction to the Bootstrap , 1994 .

[60]  Ilya Shmulevich,et al.  On Learning Gene Regulatory Networks Under the Boolean Network Model , 2003, Machine Learning.

[61]  Jaakko Astola,et al.  The role of certain Post classes in Boolean network models of genetic networks , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[62]  Nir Friedman,et al.  Inferring subnetworks from perturbed expression profiles , 2001, ISMB.

[63]  Gary A. Churchill,et al.  Analysis of Variance for Gene Expression Microarray Data , 2000, J. Comput. Biol..

[64]  L. Kadanoff,et al.  Boolean Dynamics with Random Couplings , 2002, nlin/0204062.

[65]  P. Good,et al.  Permutation Tests: A Practical Guide to Resampling Methods for Testing Hypotheses , 1995 .

[66]  Ilya Shmulevich,et al.  In silico microdissection of microarray data from heterogeneous cell populations , 2005, BMC Bioinformatics.

[67]  Karen O. Egiazarian,et al.  Spectral methods for testing membership in certain post classes and the class of forcing functions , 2004, IEEE Signal Processing Letters.

[68]  Stuart A. Kauffman,et al.  The origins of order , 1993 .

[69]  Pekka Ruusuvuori,et al.  Distribution estimation of synchronized budding yeast population , 2004 .

[70]  M. Oh,et al.  Issues in cDNA microarray analysis: quality filtering, channel normalization, models of variations and assessment of gene effects. , 2001, Nucleic acids research.

[71]  Ziv Bar-Joseph,et al.  Deconvolving cell cycle expression data with complementary information , 2004, ISMB/ECCB.

[72]  B. Silverman Density estimation for statistics and data analysis , 1986 .

[73]  S. Kauffman Metabolic stability and epigenesis in randomly constructed genetic nets. , 1969, Journal of theoretical biology.

[74]  Kevin Murphy,et al.  Modelling Gene Expression Data using Dynamic Bayesian Networks , 2006 .

[75]  Ulisses Braga-Neto,et al.  Bolstered error estimation , 2004, Pattern Recognit..

[76]  A. Arkin,et al.  Stochastic mechanisms in gene expression. , 1997, Proceedings of the National Academy of Sciences of the United States of America.

[77]  Marcel J. T. Reinders,et al.  Studying the Conditions for Learning Dynamic Bayesian Networks to Discover Genetic Regulatory Networks , 2003, Simul..

[78]  Michael Ruogu Zhang,et al.  Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. , 1998, Molecular biology of the cell.

[79]  Zoubin Ghahramani,et al.  Modeling T-cell activation using gene expression profiling and state-space models , 2004, Bioinform..

[80]  F. Crick,et al.  Molecular structure of nucleic acids , 2004, JAMA.

[81]  S. Haykin,et al.  Adaptive Filter Theory , 1986 .

[82]  Ronald W. Davis,et al.  Quantitative Monitoring of Gene Expression Patterns with a Complementary DNA Microarray , 1995, Science.

[83]  V. Thorsson,et al.  Discovery of regulatory interactions through perturbation: inference and experimental design. , 1999, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[84]  Satoru Miyano,et al.  Identification of Genetic Networks from a Small Number of Gene Expression Patterns Under the Boolean Network Model , 1998, Pacific Symposium on Biocomputing.

[85]  Tommi S. Jaakkola,et al.  Using Graphical Models and Genomic Expression Data to Statistically Validate Models of Genetic Regulatory Networks , 2000, Pacific Symposium on Biocomputing.

[86]  Xiaohong Huang,et al.  Comparing three methods for variance estimation with duplicated high density oligonucleotide arrays , 2002, Functional & Integrative Genomics.

[87]  I. Shmulevich,et al.  Computational and Statistical Approaches to Genomics , 2007, Springer US.

[88]  D. B. Preston Spectral Analysis and Time Series , 1983 .

[89]  Peer Bork,et al.  Comparison of computational methods for the identification of cell cycle-regulated genes , 2005, Bioinform..

[90]  Chunlei Wu,et al.  Differential gene and protein expression in primary breast malignancies and their lymph node metastases as revealed by combined cDNA microarray and tissue microarray analysis , 2004, Cancer.

[91]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[92]  Juan Toro,et al.  The detection of hidden periodicities: A comparison of alternative methods , 2004 .

[93]  H. Iba,et al.  Inferring a system of differential equations for a gene regulatory network by using genetic programming , 2001, Proceedings of the 2001 Congress on Evolutionary Computation (IEEE Cat. No.01TH8546).

[94]  Ron Shamir,et al.  Chain functions and scoring functions in genetic networks , 2003, ISMB.

[95]  Ron O. Dror,et al.  Bayesian Estimation of Transcript Levels Using a General Model of Array Measurement Noise , 2003, J. Comput. Biol..

[96]  Satoru Miyano,et al.  Identification of genetic networks by strategic gene disruptions and gene overexpressions under a boolean model , 2003, Theor. Comput. Sci..

[97]  Zoubin Ghahramani,et al.  A Bayesian approach to reconstructing genetic regulatory networks with hidden factors , 2005, Bioinform..

[98]  Satoru Miyano,et al.  Combining microarrays and biological knowledge for estimating gene networks via Bayesian networks , 2003, Computational Systems Bioinformatics. CSB2003. Proceedings of the 2003 IEEE Bioinformatics Conference. CSB2003.

[99]  Xiaobo Zhou,et al.  Missing-value estimation using linear and non-linear regression with Bayesian gene selection , 2003, Bioinform..

[100]  O. Yli-Harja,et al.  On spectral techniques in analysis of Boolean networks , 2005 .

[101]  P. Lio’,et al.  Periodic gene expression program of the fission yeast cell cycle , 2004, Nature Genetics.

[102]  K Sivakumar,et al.  General nonlinear framework for the analysis of gene interaction via multivariate expression arrays. , 2000, Journal of biomedical optics.

[103]  E. Davidson,et al.  Genomic cis-regulatory logic: experimental and computational analysis of a sea urchin gene. , 1998, Science.

[104]  Michal Linial,et al.  Using Bayesian Networks to Analyze Expression Data , 2000, J. Comput. Biol..

[105]  Jae K. Lee,et al.  Bayesian hierarchical error model for analysis of gene expression data , 2004, Bioinform..

[106]  Peter J. Rousseeuw,et al.  Robust regression and outlier detection , 1987 .

[107]  S. Kauffman,et al.  Activities and sensitivities in boolean network models. , 2004, Physical review letters.

[108]  E. Davidson,et al.  The hardwiring of development: organization and function of genomic regulatory systems. , 1997, Development.

[109]  H. Lodish Molecular Cell Biology , 1986 .

[110]  Michael A. Savageau,et al.  Effects of alternative connectivity on behavior of randomly constructed Boolean networks , 2002 .

[111]  Xiaobo Zhou,et al.  Construction of genomic networks using mutual-information clustering and reversible-jump Markov-chain-Monte-Carlo predictor design , 2003, Signal Process..

[112]  Lansun Ohen,et al.  A BIOCHEMICAL OSCILLATION , 1985 .

[113]  Russ B. Altman,et al.  Missing value estimation methods for DNA microarrays , 2001, Bioinform..

[114]  Anders Berglund,et al.  A multivariate approach applied to microarray data for identification of genes with cell cycle-coupled transcription , 2003, Bioinform..

[115]  Shyamal D Peddada,et al.  A random-periods model for expression of cell-cycle genes. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[116]  Gary D. Stormo,et al.  Modeling Regulatory Networks with Weight Matrices , 1998, Pacific Symposium on Biocomputing.

[117]  B. Derrida,et al.  Random networks of automata: a simple annealed approximation , 1986 .

[118]  Aurélien Mazurie,et al.  Gene networks inference using dynamic Bayesian networks , 2003, ECCB.

[119]  Timothy B. Stockwell,et al.  The Sequence of the Human Genome , 2001, Science.

[120]  J. Astola,et al.  Binary polynomial transforms and nonlinear digital filters , 1995 .

[121]  Martin Vingron,et al.  Variance stabilization applied to microarray data calibration and to the quantification of differential expression , 2002, ISMB.

[122]  J. J. Fox,et al.  From topology to dynamics in biochemical networks. , 2001, Chaos.

[123]  J. Rissanen,et al.  Normalized Maximum Likelihood Models for Boolean Regression with Application to Prediction and Classification in Genomics , 2003 .

[124]  Hongzhe Li,et al.  Model-based methods for identifying periodically expressed genes based on time course microarray gene expression data , 2004, Bioinform..

[125]  F. Sanger,et al.  DNA sequencing with chain-terminating inhibitors. , 1977, Proceedings of the National Academy of Sciences of the United States of America.

[126]  Tommi S. Jaakkola,et al.  Combining Location and Expression Data for Principled Discovery of Genetic Regulatory Network Models , 2001, Pacific Symposium on Biocomputing.

[127]  Ilya Shmulevich,et al.  Relationships between probabilistic Boolean networks and dynamic Bayesian networks as models of gene regulatory networks , 2006, Signal Process..

[128]  Richard A. Johnson,et al.  Applied Multivariate Statistical Analysis , 1983 .

[129]  Marcel J. T. Reinders,et al.  Linear Modeling of Genetic Networks from Experimental Data , 2000, ISMB.

[130]  Toshihide Ibaraki,et al.  Error-Free and Best-Fit Extensions of Partially Defined Boolean Functions , 1998, Inf. Comput..

[131]  P. Cluzel,et al.  A natural class of robust networks , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[132]  John Quackenbush Microarray data normalization and transformation , 2002, Nature Genetics.

[133]  Edward R. Dougherty,et al.  Mappings between probabilistic Boolean networks , 2003, Signal Process..

[134]  B. Derrida,et al.  Phase Transitions in Two-Dimensional Kauffman Cellular Automata , 1986 .

[135]  D. Ruppert The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .

[136]  Stuart A. Kauffman,et al.  The ensemble approach to understand genetic regulatory networks , 2004 .

[137]  L. Liotta,et al.  Laser capture microdissection. , 2006, Methods in molecular biology.

[138]  Hidde de Jong,et al.  Modeling and Simulation of Genetic Regulatory Systems: A Literature Review , 2002, J. Comput. Biol..

[139]  Daphne Koller,et al.  Active Learning for Parameter Estimation in Bayesian Networks , 2000, NIPS.

[140]  W. Just,et al.  The number and probability of canalizing functions , 2003, math-ph/0312033.

[141]  M. Stone Cross‐Validatory Choice and Assessment of Statistical Predictions , 1976 .

[142]  Michael L. Bittner,et al.  Growing genetic regulatory networks from seed genes , 2004, Bioinform..

[143]  Gregory F. Cooper,et al.  Discovery of Causal Relationships in a Gene-Regulation Pathway from a Mixture of Experimental and Observational DNA Microarray Data , 2001, Pacific Symposium on Biocomputing.

[144]  Willem A Rensink,et al.  Statistical issues in microarray data analysis. , 2006, Methods in molecular biology.

[145]  Pierre Baldi,et al.  DNA Microarrays and Gene Expression - From Experiments to Data Analysis and Modeling , 2002 .

[146]  Edward R. Dougherty,et al.  Random processes for image and signal processing , 1998, SPIE / IEEE series on imaging science and engineering.

[147]  Terence P. Speed,et al.  A comparison of normalization methods for high density oligonucleotide array data based on variance and bias , 2003, Bioinform..

[148]  L. P. Zhao,et al.  Statistical modeling of large microarray data sets to identify stimulus-response profiles , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[149]  E. Dougherty,et al.  Genomic Signal Processing and Statistics , 2005 .

[150]  Tommi S. Jaakkola,et al.  A new approach to analyzing gene expression time series data , 2002, RECOMB '02.

[151]  Aleksey A. Nakorchevskiy,et al.  Expression deconvolution: A reinterpretation of DNA microarray data reveals dynamic changes in cell populations , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[152]  Roger E Bumgarner,et al.  Robust Estimation of cDNA Microarray Intensities with Replicates , 2003 .

[153]  Edward R. Dougherty,et al.  CAN MARKOV CHAIN MODELS MIMIC BIOLOGICAL REGULATION , 2002 .

[154]  Emil L. Post The two-valued iterative systems of mathematical logic , 1942 .

[155]  K. Skarstad,et al.  Limiting DNA replication to once and only once , 2000, EMBO reports.

[156]  Ronald K. Pearson,et al.  BMC Bioinformatics BioMed Central Methodology article , 2005 .

[157]  Satoru Miyano,et al.  Inferring qualitative relations in genetic networks and metabolic pathways , 2000, Bioinform..

[158]  Alexander J. Hartemink,et al.  Principled computational methods for the validation discovery of genetic regulatory networks , 2001 .

[159]  Satoru Miyano,et al.  Bayesian Network and Nonparametric Heteroscedastic Regression for Nonlinear Modeling of Genetic Network , 2003, J. Bioinform. Comput. Biol..

[160]  Emil L. Post Introduction to a General Theory of Elementary Propositions , 1921 .

[161]  M. Bittner,et al.  Expression profiling using cDNA microarrays , 1999, Nature Genetics.

[162]  A. Arkin,et al.  It's a noisy business! Genetic regulation at the nanomolar scale. , 1999, Trends in genetics : TIG.

[163]  C. Ball,et al.  Identification of genes periodically expressed in the human cell cycle and their expression in tumors. , 2002, Molecular biology of the cell.

[164]  Aniruddha Datta,et al.  External Control in Markovian Genetic Regulatory Networks , 2004, Machine Learning.

[165]  I. Tabus,et al.  Genetic networks inferred from time series of gene expression data , 2004, First International Symposium on Control, Communications and Signal Processing, 2004..

[166]  Jaakko Astola,et al.  Microarray quality control , 2004 .

[167]  E. Dougherty,et al.  CONTROL OF STATIONARY BEHAVIOR IN PROBABILISTIC BOOLEAN NETWORKS BY MEANS OF STRUCTURAL INTERVENTION , 2002 .

[168]  Richard E. Neapolitan,et al.  Learning Bayesian networks , 2007, KDD '07.

[169]  Tommi S. Jaakkola,et al.  Maximum-likelihood estimation of optimal scaling factors for expression array normalization , 2001, SPIE BiOS.

[170]  S Fuhrman,et al.  Reveal, a general reverse engineering algorithm for inference of genetic network architectures. , 1998, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[171]  Richard A. Davis,et al.  Time Series: Theory and Methods , 2013 .