Continuous hidden process model for time series expression experiments

MOTIVATION When analyzing expression experiments, researchers are often interested in identifying the set of biological processes that are up- or down-regulated under the experimental condition studied. Current approaches, including clustering expression profiles and averaging the expression profiles of genes known to participate in specific processes, fail to provide an accurate estimate of the activity levels of many biological processes. RESULTS We introduce a probabilistic continuous hidden process Model (CHPM) for time series expression data. CHPM can simultaneously determine the most probable assignment of genes to processes and the level of activation of these processes over time. To estimate model parameters, CHPM uses multiple time series datasets and incorporates prior biological knowledge. Applying CHPM to yeast expression data, we show that our algorithm produces more accurate functional assignments for genes compared to other expression analysis methods. The inferred process activity levels can be used to study the relationships between biological processes. We also report new biological experiments confirming some of the process activity levels predicted by CHPM. AVAILABILITY A Java implementation is available at http:\\www.cs.cmu.edu\~yanxins\chpm. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.

[1]  Ron Shamir,et al.  EXPANDER – an integrative program suite for microarray data analysis , 2005, BMC Bioinformatics.

[2]  A. Weiner,et al.  Software L 2 L : a simple tool for discovering the hidden significance in microarray expression data , 2005 .

[3]  Wei Pan,et al.  Incorporating biological knowledge into distance-based clustering analysis of microarray gene expression data , 2006, Bioinform..

[4]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[5]  PanWei,et al.  Incorporating biological knowledge into distance-based clustering analysis of microarray gene expression data , 2006 .

[6]  George M. Church,et al.  Biclustering of Expression Data , 2000, ISMB.

[7]  Maria Cristina C Gomes-Marcondes,et al.  Induction of protein catabolism and the ubiquitin-proteasome pathway by mild oxidative stress. , 2002, Cancer letters.

[8]  Daphne Koller,et al.  Probabilistic hierarchical clustering for biological data , 2002, RECOMB '02.

[9]  Robert E. Schapire,et al.  Hierarchical multi-label prediction of gene function , 2006, Bioinform..

[10]  Nir Friedman,et al.  Inferring quantitative models of regulatory networks from expression data , 2004, ISMB/ECCB.

[11]  R. Lempicki,et al.  Evaluation of gene expression measurements from commercial microarray platforms. , 2003, Nucleic acids research.

[12]  Indrayana Rustandi,et al.  Hidden process models , 2006, ICML.

[13]  Lei Liu,et al.  Knowledge guided analysis of microarray data , 2006, J. Biomed. Informatics.

[14]  Stuart J. Russell,et al.  Dynamic bayesian networks: representation, inference and learning , 2002 .

[15]  Francis D. Gibbons,et al.  Judging the quality of gene expression-based clustering methods using gene annotation. , 2002, Genome research.

[16]  D. Botstein,et al.  Singular value decomposition for genome-wide expression data processing and modeling. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[17]  Daphne Koller,et al.  Decomposing Gene Expression into Cellular Processes , 2002, Pacific Symposium on Biocomputing.

[18]  D. Botstein,et al.  Genomic expression programs in the response of yeast cells to environmental changes. , 2000, Molecular biology of the cell.

[19]  Thomas F. Coleman,et al.  An Interior Trust Region Approach for Nonlinear Minimization Subject to Bounds , 1993, SIAM J. Optim..

[20]  Naren Ramakrishnan,et al.  Reconstructing formal temporal models of cellular events using the GO process ontology , 2005 .

[21]  Roded Sharan,et al.  Discovering statistically significant biclusters in gene expression data , 2002, ISMB.

[22]  Olga G. Troyanskaya,et al.  A scalable method for integration and functional analysis of multiple microarray datasets , 2006, Bioinform..

[23]  Lambert C. J. Dorssers,et al.  GO-Mapper: functional analysis of gene expression data using the expression level as a score to evaluate Gene Ontology terms , 2004, Bioinform..

[24]  Purvesh Khatri,et al.  Ontological analysis of gene expression data: current tools, limitations, and open problems , 2005, Bioinform..

[25]  S Fuhrman,et al.  Reveal, a general reverse engineering algorithm for inference of genetic network architectures. , 1998, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.