A Causal Model for Disease Pathway Discovery

Pathway provides a deep insight into the mechanism of the biological process. With the increasing of high-throughout gene expression monitoring technology, a lot of data driven methods have been proposed to reconstruct the pathways from the observation data. Low reliability of the discovered results, especially the direction of the regulatory relation, is the main challenge of the existing methods. In this work, a level-wise causal search (LWCS) based disease pathway discovery method is proposed. The following three steps are conducted in each searching level of LWCS to locate the causal variables: firstly, in the parents and children (PC) discovery step, structure learning approach is employed to discover the candidate causal genes; then, in the casual direction learning step, additive noise models are explored to determine the direction of the edges, finally, the trivial causal candidates are pruned and not contained in the further level search. The proposed method is tested and verified on real life gene expression data sets. The success of the proposed method reflects that the causality is a proper model to present the regulatory relations among the genes and phenotypes.

[1]  Ruichu Cai,et al.  SADA: A General Framework to Support Robust Causation Discovery , 2013, ICML.

[2]  Ruichu Cai,et al.  Causal gene identification using combinatorial V-structure search , 2013, Neural Networks.

[3]  Ruichu Cai,et al.  BASSUM: A Bayesian semi-supervised method for classification feature selection , 2011, Pattern Recognit..

[4]  Bernhard Schölkopf,et al.  Causal Inference on Discrete Data Using Additive Noise Models , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  U. Alon,et al.  Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[6]  Michal Linial,et al.  Using Bayesian Networks to Analyze Expression Data , 2000, J. Comput. Biol..

[7]  Judea Pearl,et al.  A Theory of Inferred Causation , 1991, KR.

[8]  M. Burow,et al.  MEK5/ERK5 pathway: the first fifteen years. , 2012, Biochimica et biophysica acta.

[9]  Bernhard Schölkopf,et al.  Regression by dependence minimization and its application to causal inference in additive noise models , 2009, ICML '09.

[10]  J. Pearl Causality: Models, Reasoning and Inference , 2000 .

[11]  Daniele Avitabile,et al.  Use of DNA Microarrays to Monitor Host Response to Virus and Virus-Derived Gene Therapy Vectors , 2004, American journal of pharmacogenomics : genomics-related research in drug development and clinical practice.

[12]  Xiaowei Yang,et al.  An efficient gene selection algorithm based on mutual information , 2009, Neurocomputing.

[13]  Yoshihiro Yamanishi,et al.  GENIES: gene network inference engine based on supervised analysis , 2012, Nucleic Acids Res..

[14]  Juliane C. Dohm,et al.  Evaluation of genomic high-throughput sequencing data generated on Illumina HiSeq and Genome Analyzer systems , 2011, Genome Biology.

[15]  Dirk Husmeier,et al.  Non-homogeneous dynamic Bayesian networks with Bayesian regularization for inferring gene regulatory networks with gradually time-varying structure , 2012, Machine Learning.

[16]  Dimitris Margaritis,et al.  Improving the Reliability of Causal Discovery from Small Data Sets Using Argumentation , 2009, J. Mach. Learn. Res..

[17]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Jason Weston,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2002, Machine Learning.

[19]  A. Barabasi,et al.  Network medicine : a network-based approach to human disease , 2010 .

[20]  Benjamin Haibe-Kains,et al.  Predictive networks: a flexible, open source, web application for integration and analysis of human gene networks , 2011, Nucleic Acids Res..

[21]  Bernhard Schölkopf,et al.  Nonlinear causal discovery with additive noise models , 2008, NIPS.

[22]  Todd,et al.  Diffuse large B-cell lymphoma outcome prediction by gene-expression profiling and supervised machine learning , 2002, Nature Medicine.

[23]  Bernhard Schölkopf,et al.  Kernel-based Conditional Independence Test and Application in Causal Discovery , 2011, UAI.

[24]  Weixiong Zhang,et al.  A general co-expression network-based approach to gene expression analysis: comparison and applications , 2010, BMC Systems Biology.

[25]  Sonali Patil,et al.  Getting Started in Biological Pathway Construction and Analysis , 2008, PLoS Comput. Biol..

[26]  L. Liang,et al.  Mapping complex disease traits with global gene expression , 2009, Nature Reviews Genetics.

[27]  S. Dudoit,et al.  STATISTICAL METHODS FOR IDENTIFYING DIFFERENTIALLY EXPRESSED GENES IN REPLICATED cDNA MICROARRAY EXPERIMENTS , 2002 .

[28]  Peter Bühlmann,et al.  Estimating High-Dimensional Directed Acyclic Graphs with the PC-Algorithm , 2007, J. Mach. Learn. Res..