Knowledge-guided gene ranking by coordinative component analysis

BackgroundIn cancer, gene networks and pathways often exhibit dynamic behavior, particularly during the process of carcinogenesis. Thus, it is important to prioritize those genes that are strongly associated with the functionality of a network. Traditional statistical methods are often inept to identify biologically relevant member genes, motivating researchers to incorporate biological knowledge into gene ranking methods. However, current integration strategies are often heuristic and fail to incorporate fully the true interplay between biological knowledge and gene expression data.ResultsTo improve knowledge-guided gene ranking, we propose a novel method called coordinative component analysis (COCA) in this paper. COCA explicitly captures those genes within a specific biological context that are likely to be expressed in a coordinative manner. Formulated as an optimization problem to maximize the coordinative effort, COCA is designed to first extract the coordinative components based on a partial guidance from knowledge genes and then rank the genes according to their participation strengths. An embedded bootstrapping procedure is implemented to improve statistical robustness of the solutions. COCA was initially tested on simulation data and then on published gene expression microarray data to demonstrate its improved performance as compared to traditional statistical methods. Finally, the COCA approach has been applied to stem cell data to identify biologically relevant genes in signaling pathways. As a result, the COCA approach uncovers novel pathway members that may shed light into the pathway deregulation in cancers.ConclusionWe have developed a new integrative strategy to combine biological knowledge and microarray data for gene ranking. The method utilizes knowledge genes for a guidance to first extract coordinative components, and then rank the genes according to their contribution related to a network or pathway. The experimental results show that such a knowledge-guided strategy can provide context-specific gene ranking with an improved performance in pathway member identification.

[1]  R. Tibshirani,et al.  Significance analysis of microarrays applied to the ionizing radiation response , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[2]  Huai Li,et al.  Unraveling transcriptional regulatory programs by integrative analysis of microarray and transcription factor binding data , 2008, Bioinform..

[3]  Andrew McCallum,et al.  Piecewise pseudolikelihood for efficient training of conditional random fields , 2007, ICML '07.

[4]  Takumi Miura,et al.  Monitoring early differentiation events in human embryonic stem cells by massively parallel signature sequencing and expressed sequence tag scan. , 2004, Stem cells and development.

[5]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[6]  M. Kenward,et al.  An Introduction to the Bootstrap , 2007 .

[7]  M. Kathleen Kerr,et al.  Linear Models for Microarray Data Analysis: Hidden Similarities and Differences , 2003, J. Comput. Biol..

[8]  Karthik Devarajan,et al.  Nonnegative Matrix Factorization: An Analytical and Interpretive Tool in Computational Biology , 2008, PLoS Comput. Biol..

[9]  Cesare Furlanello,et al.  Entropy-based gene ranking without selection bias for the predictive classification of microarray data , 2003, BMC Bioinformatics.

[10]  A. Brivanlou,et al.  Molecular signature of human embryonic stem cells and its comparison with the mouse. , 2003, Developmental biology.

[11]  Tamar Dvash,et al.  Molecular Analysis of LEFTY‐Expressing Cells in Early Human Embryoid Bodies , 2007, Stem cells.

[12]  James J. Chen,et al.  Reproducibility of microarray data: a further analysis of microarray quality control (MAQC) data , 2007, BMC Bioinformatics.

[13]  F. Pacifico,et al.  Characterization of the mouse Tdgf1 gene and Tdgf pseudogenes , 1996, Mammalian Genome.

[14]  Jieping Ye,et al.  Least squares linear discriminant analysis , 2007, ICML '07.

[15]  Robert Clarke,et al.  Gene Module Identification from Microarray Data Using Nonnegative Independent Component Analysis , 2008, Gene regulation and systems biology.

[16]  Brad T. Sherman,et al.  Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources , 2008, Nature Protocols.

[17]  S. T. Buckland,et al.  An Introduction to the Bootstrap. , 1994 .

[18]  P. Walker,et al.  Role of Sox2 in the development of the mouse neocortex. , 2006, Developmental biology.

[19]  Korbinian Strimmer,et al.  Statistical Applications in Genetics and Molecular Biology , 2005 .

[20]  Stephen J. Roberts,et al.  Gene ranking using bootstrapped P-values , 2003, SKDD.

[21]  Jun Lu,et al.  Pathway level analysis of gene expression using singular value decomposition , 2005, BMC Bioinformatics.

[22]  Desmond J. Higham,et al.  GeneRank: Using search engine technology for the analysis of microarray experiments , 2005, BMC Bioinformatics.

[23]  Pierre-Antoine Absil,et al.  Elucidating the Altered Transcriptional Programs in Breast Cancer using Independent Component Analysis , 2007, PLoS Comput. Biol..

[24]  Chiara Sabatti,et al.  Network component analysis: Reconstruction of regulatory signals in biological systems , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[25]  M. Zhan,et al.  Genomic studies to explore self-renewal and differentiation properties of embryonic stem cells. , 2008, Frontiers in bioscience : a journal and virtual library.

[26]  Robert Clarke,et al.  Motif-directed network component analysis for regulatory network inference , 2008, BMC Bioinformatics.

[27]  S. Batzoglou,et al.  Application of independent component analysis to microarrays , 2003, Genome Biology.

[28]  Richard Simon,et al.  A comparison of bootstrap methods and an adjusted bootstrap approach for estimating the prediction error in microarray classification , 2007, Statistics in medicine.

[29]  A. Kiernan,et al.  The Notch Ligand JAG1 Is Required for Sensory Progenitor Development in the Mammalian Inner Ear , 2006, PLoS genetics.

[30]  Pablo Tamayo,et al.  Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[31]  John D. Storey,et al.  Significance analysis of time course microarray experiments. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[32]  Li Wang,et al.  CGI: a new approach for prioritizing genes by combining gene expression and protein-protein interaction data , 2007, Bioinform..

[33]  Christina Backes,et al.  GeneTrail—advanced gene set enrichment analysis , 2007, Nucleic Acids Res..

[34]  Amos Tanay,et al.  Extensive low-affinity transcriptional interactions in the yeast genome. , 2006, Genome research.

[35]  James C. Liao,et al.  Transcriptome network component analysis with limited microarray data , 2006, Bioinform..

[36]  Francisco Tirado,et al.  bioNMF: a versatile tool for non-negative matrix factorization in biology , 2006, BMC Bioinformatics.

[37]  G. Daley,et al.  Therapeutic potential of embryonic stem cells. , 2005, Blood reviews.

[38]  Sandrine Dudoit,et al.  Bagging to Improve the Accuracy of A Clustering Procedure , 2003, Bioinform..

[39]  Leonidas Sakalauskas,et al.  Simultaneous perturbation stochastic approximation of nonsmooth functions , 2007, Eur. J. Oper. Res..

[40]  D. Salomon,et al.  Cripto: A tumor growth factor and more , 2002, Journal of cellular physiology.

[41]  Michael Ruogu Zhang,et al.  Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. , 1998, Molecular biology of the cell.

[42]  Maqc Consortium The MicroArray Quality Control (MAQC) project shows inter- and intraplatform reproducibility of gene expression measurements , 2006, Nature Biotechnology.

[43]  Marcel Dettling,et al.  BagBoosting for tumor classification with gene expression data , 2004, Bioinform..

[44]  Carolina Perez-Iratxeta,et al.  Gene function in early mouse embryonic stem cell differentiation , 2007, BMC Genomics.

[45]  Ying Liu,et al.  Cross-species transcriptional profiles establish a functional portrait of embryonic stem cells. , 2007, Genomics.