Learning Regulatory Networks from Sparsely Sampled Time Series Expression Data

We present a probabilistic modeling approach to learning gene transcriptional regulation networks from time series gene expression data that is appropriate for the sparsely and irregularly sampled time series datasets currently available. We use a clustering algorithm based on statistical splines to estimate continuous probabilistic models for clusters of genes with similar time expression profiles and for individual genes. Using the learned models, we present a novel mutual information score for causal edges between pairs of clusters and between pairs of genes corresponding to a given time lag δ. This score computes dependency between expression values as continuous quantities rather than discretizing them. We present empirical results on times series data for the yeast cell cycle, using randomization trials to determine statistically significant candidate network edges and the Chow-Liu graph learning algorithm to learn the network structure, to obtain a dynamic model of cell cycle regulation. Biological validation of the inferred network suggests that our method can learn a meaningful, higher-level view of regulatory networks from sparse time series data.