Bayesian gene set analysis for identifying significant biological pathways.

We propose a hierarchical Bayesian model for analyzing gene expression data to identify pathways differentiating between two biological states (e.g., cancer vs. non-cancer and mutant vs. normal). Finding significant pathways can improve our understanding of biological processes. When the biological process of interest is related to a specific disease, eliciting a better understanding of the underlying pathways can lead to designing a more effective treatment. We apply our method to data obtained by interrogating the mutational status of p53 in 50 cancer cell lines (33 mutated and 17 normal). We identify several significant pathways with strong biological connections. We show that our approach provides a natural framework for incorporating prior biological information, and it has the best overall performance in terms of correctly identifying significant pathways compared to several alternative methods.

[1]  Hongzhe Li,et al.  Group additive regression models for genomic data analysis. , 2008, Biostatistics.

[2]  Radford M. Neal Slice Sampling , 2003, The Annals of Statistics.

[3]  S. Walker Invited comment on the paper "Slice Sampling" by Radford Neal , 2003 .

[4]  Andrew B. Nobel,et al.  Significance analysis of functional categories in gene expression studies: a structured permutation approach , 2005, Bioinform..

[5]  R. Shamir,et al.  Regulatory networks define phenotypic classes of human stem cell lines , 2008, Nature.

[6]  William Stafford Noble,et al.  Exploring Gene Expression Data with Class Scores , 2001, Pacific Symposium on Biocomputing.

[7]  J. S. Rao,et al.  Spike and Slab Gene Selection for Multigroup Microarray Data , 2005 .

[8]  A. Levine,et al.  The p53 pathway: positive and negative feedback loops , 2005, Oncogene.

[9]  Kevin G Becker,et al.  Transcriptional Profiling of Aging in Human Muscle Reveals a Common Aging Signature , 2006, PLoS genetics.

[10]  Christian A. Rees,et al.  Systematic variation in gene expression patterns in human cancer cell lines , 2000, Nature Genetics.

[11]  M. Caligiuri,et al.  Expression profiling reveals fundamental biological differences in acute myeloid leukemia with isolated trisomy 8 and normal cytogenetics. , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[12]  Thomas Lengauer,et al.  Statistical Applications in Genetics and Molecular Biology Calculating the Statistical Significance of Changes in Pathway Activity From Gene Expression Data , 2011 .

[13]  W. Wong,et al.  The calculation of posterior distributions by data augmentation , 1987 .

[14]  A. Levine,et al.  Surfing the p53 network , 2000, Nature.

[15]  M. Newton,et al.  Random-set methods identify distinct aspects of the enrichment signal in gene-set analysis , 2007, 0708.4350.

[16]  D. Damian,et al.  Statistical concerns about the GSEA procedure , 2004, Nature Genetics.

[17]  Gordon K Smyth,et al.  Statistical Applications in Genetics and Molecular Biology Linear Models and Empirical Bayes Methods for Assessing Differential Expression in Microarray Experiments , 2011 .

[18]  M. Daly,et al.  PGC-1α-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes , 2003, Nature Genetics.

[19]  R. Tibshirani,et al.  On testing the significance of sets of genes , 2006, math/0610667.

[20]  Pablo Tamayo,et al.  Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[21]  E. George,et al.  Journal of the American Statistical Association is currently published by American Statistical Association. , 2007 .

[22]  C. Harris,et al.  The IARC TP53 database: New online mutation analysis and recommendations to users , 2002, Human mutation.