Gene set analysis exploiting the topology of a pathway

BackgroundRecently, a great effort in microarray data analysis is directed towards the study of the so-called gene sets. A gene set is defined by genes that are, somehow, functionally related. For example, genes appearing in a known biological pathway naturally define a gene set. The gene sets are usually identified from a priori biological knowledge. Nowadays, many bioinformatics resources store such kind of knowledge (see, for example, the Kyoto Encyclopedia of Genes and Genomes, among others). Although pathways maps carry important information about the structure of correlation among genes that should not be neglected, the currently available multivariate methods for gene set analysis do not fully exploit it.ResultsWe propose a novel gene set analysis specifically designed for gene sets defined by pathways. Such analysis, based on graphical models, explicitly incorporates the dependence structure among genes highlighted by the topology of pathways. The analysis is designed to be used for overall surveillance of changes in a pathway in different experimental conditions. In fact, under different circumstances, not only the expression of the genes in a pathway, but also the strength of their relations may change. The methods resulting from the proposal allow both to test for variations in the strength of the links, and to properly account for heteroschedasticity in the usual tests for differential expression.ConclusionsThe use of graphical models allows a deeper look at the components of the pathway that can be tested separately and compared marginally. In this way it is possible to test single components of the pathway and highlight only those involved in its deregulation.

[1]  James R. Schott,et al.  Some tests for the equality of covariance matrices , 2001 .

[2]  U. Mansmann Genomic profiling. Interplay between clinical epidemiology, bioinformatics and biostatistics. , 2005 .

[3]  J. Schmee An Introduction to Multivariate Statistical Analysis , 1986 .

[4]  Stanley N Cohen,et al.  Effects of threshold choice on biological conclusions reached during analysis of gene expression by DNA microarrays. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[5]  John D. Storey A direct approach to false discovery rates , 2002 .

[6]  Seon-Young Kim,et al.  Gene-set approach for expression pattern analysis , 2008, Briefings Bioinform..

[7]  K. Motoyoshi,et al.  BCR–ABL promotes neutrophil differentiation in the chronic phase of chronic myeloid leukemia by downregulating c-Jun expression , 2009, Leukemia.

[8]  Qi Liu,et al.  BMC Bioinformatics BioMed Central Methodology article Comparative evaluation of gene-set analysis methods , 2007 .

[9]  Alexander R. Pico,et al.  WikiPathways: Pathway Editing for the People , 2008, PLoS biology.

[10]  Paul Pavlidis,et al.  A methodology for the analysis of differential coexpression across the human lifespan , 2009, BMC Bioinformatics.

[11]  Ulrich Mansmann,et al.  Multiple testing on the directed acyclic graph of gene ontology , 2008, Bioinform..

[12]  Lincoln Stein,et al.  Reactome knowledgebase of human biological pathways and processes , 2008, Nucleic Acids Res..

[13]  Anja Vogler,et al.  An Introduction to Multivariate Statistical Analysis , 2004 .

[14]  Hiroyuki Ogata,et al.  KEGG: Kyoto Encyclopedia of Genes and Genomes , 1999, Nucleic Acids Res..

[15]  Daniel A. Haber,et al.  Gefitinib-Sensitizing EGFR Mutations in Lung Cancer Activate Anti-Apoptotic Pathways , 2004, Science.

[16]  Qi Liu,et al.  Gene-set analysis and reduction , 2008, Briefings Bioinform..

[17]  Takayuki Kosaka,et al.  Expression profile-defined classification of lung adenocarcinoma shows close relationship with underlying major genetic changes and clinicopathologic behaviors. , 2006, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[18]  R. Tibshirani,et al.  Significance analysis of microarrays applied to the ionizing radiation response , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[19]  J. Ghysdael,et al.  The calcineurin/NFAT signaling pathway: A NOVEL therapeutic target in leukemia and solid tumors , 2008, Cell cycle.

[20]  D. Cortez,et al.  A requirement for NF-κB activation in Bcr–Abl-mediated transformation , 1998 .

[21]  Rainer Spang,et al.  Inferring cellular networks – a review , 2007, BMC Bioinformatics.

[22]  Michael I. Jordan Graphical Models , 2003 .

[23]  Susumu Goto,et al.  KEGG: Kyoto Encyclopedia of Genes and Genomes , 2000, Nucleic Acids Res..

[24]  Robert Gentleman,et al.  Gene Expression Profiles of B-lineage Adult Acute Lymphocytic Leukemia Reveal Genetic Patterns that Identify Lineage Derivation and Distinct Mechanisms of Transformation , 2005, Clinical Cancer Research.

[25]  Korbinian Strimmer,et al.  BMC Bioinformatics BioMed Central Methodology article A general modular framework for gene set enrichment analysis , 2009 .

[26]  T. Hambuch,et al.  The Bcr-Abl leukemia oncogene activates Jun kinase and requires Jun for transformation. , 1995, Proceedings of the National Academy of Sciences of the United States of America.

[27]  James J. Chen,et al.  Multivariate analysis of variance test for gene set analysis , 2009, Bioinform..

[28]  U. Mansmann,et al.  Testing Differential Gene Expression in Functional Groups , 2005, Methods of Information in Medicine.

[29]  C. Daub,et al.  BMC Systems Biology , 2007 .

[30]  K. Strimmer,et al.  Statistical Applications in Genetics and Molecular Biology A Shrinkage Approach to Large-Scale Covariance Matrix Estimation and Implications for Functional Genomics , 2011 .