A control study to evaluate a computer-based microarray experiment design recommendation system for gene-regulation pathways discovery

The main topic of this paper is evaluating a system that uses the expected value of experimentation for discovering causal pathways in gene expression data. By experimentation we mean both interventions (e.g., a gene knock-out experiment) and observations (e.g., passively observing the expression level of a "wild-type" gene). We introduce a system called GEEVE (causal discovery in Gene Expression data using Expected Value of Experimentation), which implements expected value of experimentation in discovering causal pathways using gene expression data. GEEVE provides the following assistance, which is intended to help biologists in their quest to discover gene-regulation pathways: Recommending which experiments to perform (with a focus on "knock-out" experiments) using an expected value of experimentation (EVE) method. Recommending the number of measurements (observational and experimental) to include in the experimental design, again using an EVE method. Providing a Bayesian analysis that combines prior knowledge with the results of recent microarray experimental results to derive posterior probabilities of gene regulation relationships. In recommending which experiments to perform (and how many times to repeat them) the EVE approach considers the biologist's preferences for which genes to focus the discovery process. Also, since exact EVE calculations are exponential in time, GEEVE incorporates approximation methods. GEEVE is able to combine data from knock-out experiments with data from wild-type experiments to suggest additional experiments to perform and then to analyze the results of those microarray experimental results. It models the possibility that unmeasured (latent) variables may be responsible for some of the statistical associations among the expression levels of the genes under study. To evaluate the GEEVE system, we used a gene expression simulator to generate data from specified models of gene regulation. Using the simulator, we evaluated the GEEVE system using a randomized control study that involved 10 biologists, some of whom used GEEVE and some of whom did not. The results show that biologists who used GEEVE reached correct causal assessments about gene regulation more often than did those biologists who did not use GEEVE. The GEEVE users also reached their assessments in a more cost-effective manner.

[1]  Richard E. Neapolitan,et al.  Learning Bayesian networks , 2007, KDD '07.

[2]  R. McCartney,et al.  Regulation of Snf1 Kinase , 2001, The Journal of Biological Chemistry.

[3]  J. Neumann,et al.  Theory of games and economic behavior , 1945, 100 Years of Math Milestones.

[4]  Susumu Goto,et al.  KEGG: Kyoto Encyclopedia of Genes and Genomes , 2000, Nucleic Acids Res..

[5]  Jorge Alberto Achcar,et al.  Use of bayesian analysis to design of clinical trials with one treatment , 1984 .

[6]  Pat Langley,et al.  Revising regulatory networks: from expression data to linear causal models , 2002, J. Biomed. Informatics.

[7]  Stuart A. Kauffman,et al.  The origins of order , 1993 .

[8]  Eric Horvitz,et al.  An approximate nonmyopic computation for value of information , 1994, UAI 1994.

[9]  R. L. Keeney,et al.  Decisions with Multiple Objectives: Preferences and Value Trade-Offs , 1977, IEEE Transactions on Systems, Man, and Cybernetics.

[10]  L. Glass,et al.  Combinatorial explosion in model gene networks. , 2000, Chaos.

[11]  Simon C Watkins,et al.  Std1 and Mth1 Proteins Interact with the Glucose Sensors To Control Glucose-Regulated Gene Expression in Saccharomyces cerevisiae , 1999, Molecular and Cellular Biology.

[12]  Eric Horvitz,et al.  An Approximate Nonmyopic Computation for Value of Information , 1993, IEEE Trans. Pattern Anal. Mach. Intell..

[13]  P. Spirtes,et al.  Causation, prediction, and search , 1993 .

[14]  Daphne Koller,et al.  Active Learning for Structure in Bayesian Networks , 2001, IJCAI.

[15]  N. Lee,et al.  A concise guide to cDNA microarray analysis. , 2000, BioTechniques.

[16]  Kevin Murphy,et al.  Modelling Gene Expression Data using Dynamic Bayesian Networks , 2006 .

[17]  G S Michaels,et al.  Cluster analysis and data visualization of large-scale gene expression data. , 1998, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[18]  D. A. Baxter,et al.  Modeling transcriptional control in gene networks—methods, recent results, and future directions , 2000, Bulletin of mathematical biology.

[19]  Donald A. Berry,et al.  Bayesian Methods in Health-Related Research , 2018, Bayesian Biostatistics.

[20]  Masaru Tomita,et al.  E-CELL: software environment for whole-cell simulation , 1999, Bioinform..

[21]  Gregory F. Cooper,et al.  An evaluation of a system that recommends microarray experiments to perform to discover gene-regulation pathways , 2004, Artif. Intell. Medicine.

[22]  Richard Scheines,et al.  Constructing Bayesian Network Models of Gene Expression Networks from Microarray Data , 2000 .

[23]  Gregory F. Cooper,et al.  Expected value of experimentation in causal discovery from gene expression studies , 2003 .

[24]  Roger E Bumgarner,et al.  Integrated genomic and proteomic analyses of a systematically perturbed metabolic network. , 2001, Science.

[25]  D. Heckerman,et al.  A Bayesian Approach to Causal Discovery , 2006 .

[26]  Gregory F. Cooper,et al.  Discovery of Causal Relationships in a Gene-Regulation Pathway from a Mixture of Experimental and Observational DNA Microarray Data , 2001, Pacific Symposium on Biocomputing.

[27]  P. Spirtes,et al.  Causation, Prediction, and Search, 2nd Edition , 2001 .

[28]  G. Churchill,et al.  Experimental design for gene expression microarrays. , 2001, Biostatistics.

[29]  Richard Scheines,et al.  A Statistical Problem for Inference to Regulatory Structure from Associations of Gene Expression Measurements with Microarrays , 2003, Bioinform..

[30]  H. Raiffa,et al.  Decisions with Multiple Objectives , 1993 .

[31]  P. Langley,et al.  Computational Models of Scientific Discovery and Theory Formation , 1990 .

[32]  Ida Sim,et al.  An ontology of randomized controlled trials for evidence-based practice: content specification and evaluation using the competency decomposition method , 2007 .

[33]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems , 1988 .

[34]  S. P. Fodor,et al.  High density synthetic oligonucleotide arrays , 1999, Nature Genetics.

[35]  Gregory F. Cooper,et al.  Causal Discovery from a Mixture of Experimental and Observational Data , 1999, UAI.

[36]  P. Provero,et al.  Gene networks from DNA microarray data: centrality and lethality , 2002, cond-mat/0207345.

[37]  Ka Yee Yeung,et al.  Algorithms for choosing differential gene expression experiments , 1999, RECOMB.

[38]  Dennis D. Murphy,et al.  Book review: Computational Models of Scientific Discovery and Theory Formation Edited by Jeff Shrager & Pat Langley (Morgan Kaufmann San Mateo, CA, 1990) , 1992, SGAR.

[39]  Michal Linial,et al.  Using Bayesian Networks to Analyze Expression Data , 2000, J. Comput. Biol..

[40]  David J. Spiegelhalter,et al.  Bayesian Approaches to Randomized Trials , 1994, Bayesian Biostatistics.

[41]  Gregory F. Cooper,et al.  Discovery of gene-regulation pathways using local causal search , 2002, AMIA.

[42]  Hidde de Jong,et al.  Modeling and Simulation of Genetic Regulatory Systems: A Literature Review , 2002, J. Comput. Biol..

[43]  Max Henrion,et al.  Efficient Estimation of the Value of Information in Monte Carlo Models , 1994, UAI.

[44]  R. McCartney,et al.  β‐subunits of Snf1 kinase are required for kinase function and substrate definition , 2000, The EMBO journal.

[45]  Mtw,et al.  Computation, causation, and discovery , 2000 .

[46]  Max Henrion,et al.  Propagating uncertainty in bayesian networks by probabilistic logic sampling , 1986, UAI.

[47]  J. Pearl Causality: Models, Reasoning and Inference , 2000 .

[48]  Michael Ruogu Zhang,et al.  Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. , 1998, Molecular biology of the cell.

[49]  Martin A. Nowak,et al.  Inferring Cellular Networks Using Probabilistic Graphical Models , 2004 .

[50]  David Heckerman,et al.  A Bayesian Approach to Learning Causal Networks , 1995, UAI.

[51]  C. Müller,et al.  Large-scale clustering of cDNA-fingerprinting data. , 1999, Genome research.

[52]  Suzanne M. Paley,et al.  Integrated pathway/genome databases and their role in drug discovery , 1999 .

[53]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[54]  M. Degroot,et al.  Probability and Statistics , 2021, Examining an Operational Approach to Teaching Probability.

[55]  D. Botstein,et al.  Genomic expression programs in the response of yeast cells to environmental changes. , 2000, Molecular biology of the cell.

[56]  Stephen A. Racunas,et al.  HyBrow: a prototype system for computer-aided hypothesis evaluation , 2004, ISMB/ECCB.

[57]  V. Thorsson,et al.  Discovery of regulatory interactions through perturbation: inference and experimental design. , 1999, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[58]  Pierre R. Bushel,et al.  Computational selection of distinct class- and subclass-specific gene expression signatures , 2002, J. Biomed. Informatics.

[59]  David Maxwell Chickering,et al.  Learning Bayesian Networks: The Combination of Knowledge and Statistical Data , 1994, Machine Learning.

[60]  Satoru Miyano,et al.  Identification of Genetic Networks from a Small Number of Gene Expression Patterns Under the Boolean Network Model , 1998, Pacific Symposium on Biocomputing.

[61]  Gregory F. Cooper,et al.  A Bayesian Method for the Induction of Probabilistic Networks from Data , 1992 .