Explore Biological Pathways from Noisy Array Data by Directed Acyclic Boolean Networks

We consider the structure of directed acyclic Boolean (DAB) networks as a tool for exploring biological pathways. In a DAB network, the basic objects are binary elements and their Boolean duals. A DAB is characterized by two kinds of pairwise relations: similarity and prerequisite. The latter is a partial order relation, namely, the on-status of one element is necessary for the on-status of another element. A DAB network is uniquely determined by the state space of its elements. We arrange samples from the state space of a DAB network in a binary array and introduce a random mechanism of measurement error. Our inference strategy consists of two stages. First, we consider each pair of elements and try to identify their most likely relation. In the meantime, we assign a score, s-p-score, to this relation. Second, we rank the s-p-scores obtained from the first stage. We expect that relations with smaller s-p-scores are more likely to be true, and those with larger s-p-scores are more likely to be false. The key idea is the definition of s-scores (referring to similarity), p-scores (referring to prerequisite), and s-p-scores. As with classical statistical tests, control of false negatives and false positives are our primary concerns. We illustrate the method by a simulated example, the classical arginine biosynthetic pathway, and show some exploratory results on a published microarray expression dataset of yeast Saccharomyces cerevisiae obtained from experiments with activation and genetic perturbation of the pheromone response MAPK pathway.

[1]  T. Hughes,et al.  Signaling and circuitry of multiple MAPK pathways revealed by a matrix of global gene expression profiles. , 2000, Science.

[2]  Ron Shamir,et al.  Clustering Gene Expression Patterns , 1999, J. Comput. Biol..

[3]  G. Church,et al.  Predicting regulons and their cis-regulatory motifs by comparative genomics. , 2000, Nucleic acids research.

[4]  U. Alon,et al.  Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[5]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[6]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[7]  Michal Linial,et al.  Using Bayesian Networks to Analyze Expression Data , 2000, J. Comput. Biol..

[8]  René Thomas,et al.  Kinetic logic : a Boolean approach to the analysis of complex regulatory systems : proceedings of the EMBO course "Formal analysis of genetic regulation," held in Brussels, September 6-16, 1977 , 1979 .

[9]  Richard M. Karp,et al.  CLIFF: clustering of high-dimensional microarray data via iterative feature filtering using normalized cuts , 2001, ISMB.

[10]  David Maxwell Chickering,et al.  Learning Bayesian Networks: The Combination of Knowledge and Statistical Data , 1994, Machine Learning.

[11]  G. McLachlan,et al.  The EM algorithm and extensions , 1996 .

[12]  D. Eisenberg,et al.  Protein function in the post-genomic era , 2000, Nature.

[13]  B. Bainbridge,et al.  Genetics , 1981, Experientia.

[14]  Daniel Kahneman,et al.  Probabilistic reasoning , 1993 .

[15]  P. Bickel,et al.  Mathematical Statistics: Basic Ideas and Selected Topics , 1977 .

[16]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[17]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[18]  Richard Scheines,et al.  Constructing Bayesian Network Models of Gene Expression Networks from Microarray Data , 2000 .

[19]  Stephen E. Fienberg,et al.  Testing Statistical Hypotheses , 2005 .

[20]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems , 1988 .

[21]  Hidde de Jong,et al.  Modeling and Simulation of Genetic Regulatory Systems: A Literature Review , 2002, J. Comput. Biol..

[22]  Tom Burr,et al.  Causation, Prediction, and Search , 2003, Technometrics.

[23]  S. Kauffman Assessing the Probable Regulatory Structures and Dynamics of the Metazoan Genome , 1979 .

[24]  S. Kauffman Gene regulation networks: a theory for their global structure and behaviors. , 1971, Current topics in developmental biology.

[25]  Volker Brendel,et al.  Multi-query sequence BLAST output examination with MuSeqBox , 2001, Bioinform..