Extracting Dynamics from Static Cancer Expression Data

Static expression experiments analyze samples from many individuals. These samples are often snapshots of the progression of a certain disease such as cancer. This raises an intriguing question: Can we determine a temporal order for these samples? Such an ordering can lead to better understanding of the dynamics of the disease and to the identification of genes associated with its progression. In this paper, we formally prove, for the first time, that under a model for the dynamics of the expression levels of a single gene, it is indeed possible to recover the correct ordering of the static expression data sets by solving an instance of the traveling salesman problem (TSP). In addition, we devise an algorithm that combines a TSP heuristic and probabilistic modeling for inferring the underlying temporal order of the microarray experiments. This algorithm constructs probabilistic continuous curves to represent expression profiles and can thus account for noise and for individual background expression differences leading to accurate temporal reconstruction for human data. Applying our method to cancer expression data, we show that the ordering derived agrees well with survival duration. A classifier that utilizes this ordering improves upon other classifiers suggested for this task. The set of genes displaying consistent behavior for the determined ordering is enriched for genes associated with cancer progression.

[1]  Ziv Bar-Joseph,et al.  A Patient-Gene Model for Temporal Expression Profiles in Clinical Studies , 2006, RECOMB.

[2]  E. Lander,et al.  Human macrophage activation programs induced by bacterial pathogens , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[3]  Ziv Bar-Joseph,et al.  STEM: a tool for the analysis of short time series gene expression data , 2006, BMC Bioinformatics.

[4]  Veena Vanchinathan,et al.  A gene-expression program reflecting the innate immune response of cultured intestinal epithelial cells to infection by Listeria monocytogenes , 2002, Genome Biology.

[5]  Carlo Di Bello,et al.  PCA disjoint models for multiclass cancer analysis using gene expression data , 2003, Bioinform..

[6]  Satoru Miyano,et al.  Inferring gene networks from time series microarray data using dynamic Bayesian networks , 2003, Briefings Bioinform..

[7]  Nello Cristianini,et al.  Support vector machine classification and validation of cancer tissue samples using microarray expression data , 2000, Bioinform..

[8]  V. Weaver,et al.  Tissue structure, nuclear organization, and gene expression in normal and malignant breast. , 1999, Cancer research.

[9]  Fen Wang,et al.  Directionally Specific Paracrine Communication Mediated by Epithelial FGF9 to Stromal FGFR3 in Two-Compartment Premalignant Prostate Tumors , 2004, Cancer Research.

[10]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[11]  Joakim Lundeberg,et al.  Global gene expression analysis in time series following N-acetyl L-cysteine induced epithelial differentiation of human normal and cancer cells in vitro , 2005, BMC Cancer.

[12]  D. Botstein,et al.  Genomic expression programs in the response of yeast cells to environmental changes. , 2000, Molecular biology of the cell.

[13]  Joachim Giesen,et al.  Curve Reconstruction in Arbitrary Dimension and the Traveling Salesman Problem , 1999, DGCI.

[14]  Richard M. Karp,et al.  Discovering local structure in gene expression data: the order-preserving submatrix problem , 2002, RECOMB '02.

[15]  Sayan Mukherjee,et al.  Molecular classification of multiple tumor types , 2001, ISMB.

[16]  T. Golub,et al.  Gene expression-based classification of malignant gliomas correlates better with survival than histological classification. , 2003, Cancer research.

[17]  T. Hastie,et al.  Principal Curves , 2007 .

[18]  Junhyong Kim,et al.  Reconstructing the Temporal Ordering of Biological Samples Using Microarray Data , 2003, Bioinform..

[19]  Erik D. Demaine,et al.  K-ary Clustering with Optimal Leaf Ordering for Gene Expression Data , 2002, WABI.