Inferring Causal Relations from Multivariate Time Series: A Fast Method for Large-Scale Gene Expression Data

Various multivariate time series analysis techniques have been developed with the aim of inferring causal relations between time series. Previously, these techniques have proved their effectiveness on economic and neurophysiological data,which normally consist of hundreds of samples. However, in their applications to gene regulatory inference, the small sample size of gene expression time series poses an obstacle. In this paper, we describe some of the most commonly used multivariate inference techniques and show the potential challenge related to gene expression analysis. In response, we propose a directed partial correlation (DPC) algorithm as an ef¿cient and effective solution to causal/regulatory relations inference on small sample gene expression data. Comparative evaluations on the existing techniques and the proposed method are presented. To draw reliable conclusions, a comprehensive benchmarking on data sets of various setups is essential. Three experiments are designed to assess these methods in a coherent manner. Detailed analysis of experimental results not only reveals good accuracy of the proposed DPC method in large-scale prediction, but also gives much insight into all methods under evaluation.

[1]  Kathleen Marchal,et al.  Validating module network learning algorithms using simulated data , 2007, BMC Bioinformatics.

[2]  James P. Egan,et al.  Signal detection theory and ROC analysis , 1975 .

[3]  Alexander J. Hartemink,et al.  Informative Structure Priors: Joint Learning of Dynamic Regulatory Networks from Multiple Types of Data , 2004, Pacific Symposium on Biocomputing.

[4]  C. Granger Investigating causal relations by econometric models and cross-spectral methods , 1969 .

[5]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[6]  Sophie Lèbre,et al.  Statistical Applications in Genetics and Molecular Biology Inferring Dynamic Genetic Networks with Low Order Independencies Inferring Dynamic Genetic Networks with Low Order Independencies ∗ , 2009 .

[7]  Snigdhansu Chatterjee,et al.  Causality and pathway search in microarray time series experiment , 2007, Bioinform..

[8]  Korbinian Strimmer,et al.  Learning causal networks from systems biology time course data: an effective model selection procedure for the vector autoregressive process , 2007, BMC Bioinformatics.

[9]  Korbinian Strimmer,et al.  An empirical Bayes approach to inferring large-scale gene association networks , 2005, Bioinform..

[10]  A. T. Vasconcelos,et al.  Genome-wide partial correlation analysis of Escherichia coli microarray data. , 2007, Genetics and molecular research : GMR.

[11]  Kathleen Marchal,et al.  SynTReN: a generator of synthetic gene expression data for design and analysis of structure learning algorithms , 2006, BMC Bioinformatics.

[12]  E. Gehan,et al.  The properties of high-dimensional data spaces: implications for exploring gene and protein expression data , 2008, Nature Reviews Cancer.

[13]  N. Sidiropoulos,et al.  Maximum likelihood fitting using ordinary least squares algorithms , 2002 .

[14]  Karin Schwab,et al.  Comparison of linear signal processing techniques to infer directed interactions in multivariate neural systems , 2005, Signal Process..

[15]  Alexander Dekhtyar,et al.  Information Retrieval , 2018, Lecture Notes in Computer Science.

[16]  Tom Fawcett,et al.  An introduction to ROC analysis , 2006, Pattern Recognit. Lett..

[17]  Rodrigo Quian Quiroga,et al.  Nonlinear multivariate analysis of neurophysiological signals , 2005, Progress in Neurobiology.