Generalized correlation measure using count statistics for gene expression data with ordered samples

Motivation Capturing association patterns in gene expression levels under different conditions or time points is important for inferring gene regulatory interactions. In practice, temporal changes in gene expression may result in complex association patterns that require more sophisticated detection methods than simple correlation measures. For instance, the effect of regulation may lead to time-lagged associations and interactions local to a subset of samples. Furthermore, expression profiles of interest may not be aligned or directly comparable (e.g. gene expression profiles from two species). Results We propose a count statistic for measuring association between pairs of gene expression profiles consisting of ordered samples (e.g. time-course), where correlation may only exist locally in subsequences separated by a position shift. The statistic is simple and fast to compute, and we illustrate its use in two applications. In a cross-species comparison of developmental gene expression levels, we show our method not only measures association of gene expressions between the two species, but also provides alignment between different developmental stages. In the second application, we applied our statistic to expression profiles from two distinct phenotypic conditions, where the samples in each profile are ordered by the associated phenotypic values. The detected associations can be useful in building correspondence between gene association networks under different phenotypes. On the theoretical side, we provide asymptotic distributions of the statistic for different regions of the parameter space and test its power on simulated data. Availability and implementation The code used to perform the analysis is available as part of the Supplementary Material. Contact msw@usc.edu or hhuang@stat.berkeley.edu. Supplementary information Supplementary data are available at Bioinformatics online.

[1]  Carsten O. Daub,et al.  Estimating mutual information using B-spline functions – an improved similarity measure for analysing gene expression data , 2004, BMC Bioinformatics.

[2]  M. Gerstein,et al.  Beyond synexpression relationships: local clustering of time-shifted and inverted gene expression profiles identifies new, biologically relevant interactions. , 2001, Journal of molecular biology.

[3]  George M. Church,et al.  Biclustering of Expression Data , 2000, ISMB.

[4]  Tatsuhiko Tsunoda,et al.  Lag Analysis of Genetic Networks in the Cell Cycle of Budding Yeast , 2001 .

[5]  Mark Craven,et al.  Similarity Queries for Temporal Toxicogenomic Expression Profiles , 2008, PLoS Comput. Biol..

[6]  Atul J. Butte,et al.  Systematic survey reveals general applicability of "guilt-by-association" within gene coexpression networks , 2005, BMC Bioinformatics.

[7]  Ulrich Bodenhofer,et al.  FABIA: factor analysis for bicluster acquisition , 2010, Bioinform..

[8]  Kian-Lee Tan,et al.  Identifying time-lagged gene clusters using gene expression data , 2005, Bioinform..

[9]  Ulrich Bodenhofer,et al.  FABIA: factor analysis for bicluster acquisition , 2010 .

[10]  R. Durbin,et al.  Using probabilistic estimation of expression residuals (PEER) to obtain increased power and interpretability of gene expression analyses , 2012, Nature Protocols.

[11]  Andrea Omicini,et al.  Proceedings of the 2004 ACM Symposium on Applied Computing (SAC 2004) , 2004 .

[12]  Adam A. Margolin,et al.  Reverse engineering of regulatory networks in human B cells , 2005, Nature Genetics.

[13]  W. Huber,et al.  Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2 , 2014, Genome Biology.

[14]  Carsten O. Daub,et al.  The mutual information: Detecting and evaluating dependencies between variables , 2002, ECCB.

[15]  Eyke Hüllermeier,et al.  Clustering of gene expression data using a local shape-based similarity measure , 2005, Bioinform..

[16]  Martin Vingron,et al.  Development and application of a modified dynamic time warping algorithm (DTW-S) to analyses of primate brain expression time series , 2011, BMC Bioinformatics.

[17]  Juntao Li,et al.  Identifying local co-regulation relationships in gene expression data. , 2014, Journal of theoretical biology.

[18]  Holger H. Hoos,et al.  Inference of transcriptional regulation relationships from gene expression data , 2003, SAC '03.

[19]  Yingmin Jia,et al.  Biclustering of Linear Patterns In Gene Expression Data , 2012, J. Comput. Biol..

[20]  M. Gerstein,et al.  Unlocking the secrets of the genome , 2009, Nature.

[21]  Bud Mishra,et al.  Time-frequency feature detection for time-course microarray data , 2004, SAC '04.

[22]  Dmitri A. Papatsenko,et al.  Time warping of evolutionary distant temporal gene expression data based on noise suppression , 2009, BMC Bioinformatics.

[23]  Jugal K. Kalita,et al.  Reconstruction of gene co-expression network from microarray data using local expression patterns , 2014, BMC Bioinformatics.

[24]  Debojyoti Dutta,et al.  Local similarity analysis reveals unique associations among marine bacterioplankton species and environmental factors , 2006, Bioinform..

[25]  M. Waterman,et al.  Gene coexpression measures in large heterogeneous samples using count statistics , 2014, Proceedings of the National Academy of Sciences.

[26]  Steven E Brenner,et al.  Comparison of D. melanogaster and C. elegans developmental stages, tissues, and cells by modENCODE RNA-seq data , 2014, Genome research.

[27]  George M. Church,et al.  Aligning gene expression time series with time warping algorithms , 2001, Bioinform..

[28]  Joshua M. Stuart,et al.  A Gene-Coexpression Network for Global Discovery of Conserved Genetic Modules , 2003, Science.

[29]  L. Lazzeroni Plaid models for gene expression data , 2000 .

[30]  Feng Lin,et al.  Phenotypic predictors of response to simvastatin therapy among African-Americans and Caucasians: the Cholesterol and Pharmacogenetics (CAP) Study. , 2006, The American journal of cardiology.

[31]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[32]  Paola Sebastiani,et al.  Cluster analysis of gene expression dynamics , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[33]  Holger H. Hoos,et al.  Inference of Transcriptional Regulation Relationships from Gene Expression Data , 2003, Bioinform..

[34]  Chris Wiggins,et al.  ARACNE: An Algorithm for the Reconstruction of Gene Regulatory Networks in a Mammalian Cellular Context , 2004, BMC Bioinformatics.

[35]  Dongmei Ai,et al.  Efficient statistical significance approximation for local similarity analysis of high-throughput time series data , 2013, Bioinform..