Martini: using literature keywords to compare gene sets

Life scientists are often interested to compare two gene sets to gain insight into differences between two distinct, but related, phenotypes or conditions. Several tools have been developed for comparing gene sets, most of which find Gene Ontology (GO) terms that are significantly over-represented in one gene set. However, such tools often return GO terms that are too generic or too few to be informative. Here, we present Martini, an easy-to-use tool for comparing gene sets. Martini is based, not on GO, but on keywords extracted from Medline abstracts; Martini also supports a much wider range of species than comparable tools. To evaluate Martini we created a benchmark based on the human cell cycle, and we tested several comparable tools (CoPub, FatiGO, Marmite and ProfCom). Martini had the best benchmark performance, delivering a more detailed and accurate description of function. Martini also gave best or equal performance with three other datasets (related to Arabidopsis, melanoma and ovarian cancer), suggesting that Martini represents an advance in the automated comparison of gene sets. In agreement with previous studies, our results further suggest that literature-derived keywords are a richer source of gene-function information than GO annotations. Martini is freely available at http://martini.embl.de.

[1]  C. Ball,et al.  Identification of genes periodically expressed in the human cell cycle and their expression in tumors. , 2002, Molecular biology of the cell.

[2]  Joaquín Dopazo,et al.  The role of the environment in Parkinson's disease. , 1996, Nucleic Acids Res..

[3]  B. De Moor,et al.  TXTGate: profiling gene groups with text-based information , 2004, Genome Biology.

[4]  Gene Ontology Consortium The Gene Ontology (GO) database and informatics resource , 2003 .

[5]  R. West,et al.  Frequency of carriers of cystic fibrosis gene among patients with myeloid malignancy and melanoma. , 1991, BMJ.

[6]  J. Whang‐Peng,et al.  Polyploidy in malignant melanoma , 1970, Cancer.

[7]  R. Baron,et al.  Melanoma cells stimulate osteoclastogenesis, c-Src expression and osteoblast cytokines. , 2001, European journal of cancer.

[8]  R. Fisher On the Interpretation of χ2 from Contingency Tables, and the Calculation of P , 2010 .

[9]  Lloyd J. Old,et al.  Cancer/testis antigens, gametogenesis and cancer , 2005, Nature Reviews Cancer.

[10]  Ralf Zimmer,et al.  Expert knowledge without the expert: integrated analysis of gene expression and literature to derive active functional contexts , 2005, ECCB/JBI.

[11]  Joaquín Dopazo,et al.  Babelomics: advanced functional profiling of transcriptomics, proteomics and genomics experiments , 2008, Nucleic Acids Res..

[12]  William R. Hersh,et al.  A survey of current work in biomedical text mining , 2005, Briefings Bioinform..

[13]  Purvesh Khatri,et al.  Ontological analysis of gene expression data: current tools, limitations, and open problems , 2005, Bioinform..

[14]  S. Eschrich,et al.  The gene expression profiles of primary and metastatic melanoma yields a transition point of tumor progression and metastasis , 2008, BMC Medical Genomics.

[15]  T. Paull,et al.  Activation and regulation of ATM kinase activity in response to DNA double-strand breaks , 2007, Oncogene.

[16]  Pierre van der Bruggen,et al.  Structure, chromosomal localization, and expression of 12 genes of the MAGE family , 2005, Immunogenetics.

[17]  M. Schuemie,et al.  Anni 2.0: a multipurpose text-mining tool for the life sciences , 2008, Genome Biology.

[18]  Thorsten Schmidt,et al.  ProfCom: a web tool for profiling the complex functionality of gene groups identified from high-throughput data , 2008, Nucleic Acids Res..

[19]  R. Fisher 019: On the Interpretation of x2 from Contingency Tables, and the Calculation of P. , 1922 .

[20]  Kei-Hoi Cheung,et al.  Handling multiple testing while interpreting microarrays with the Gene Ontology Database , 2004, BMC Bioinformatics.

[21]  Hagit Shatkay,et al.  Discovering semantic features in the literature: a foundation for building functional associations , 2006, BMC Bioinformatics.

[22]  D. Chaussabel,et al.  Mining microarray expression data by literature profiling , 2002, Genome Biology.

[23]  Barend Mons,et al.  Text-derived concept profiles support assessment of DNA microarray data for acute myeloid leukemia and for androgen receptor stimulation , 2007, BMC Bioinformatics.

[24]  Thure Etzold,et al.  SRS - an indexing and retrieval tool for flat file data libraries , 1993, Comput. Appl. Biosci..

[25]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[26]  J. Boyd,et al.  Gene Expression Profiles of Serous, Endometrioid, and Clear Cell Subtypes of Ovarian and Endometrial Cancer , 2005, Clinical Cancer Research.

[27]  P. Bork,et al.  Co-evolution of transcriptional and post-translational cell-cycle regulation , 2006, Nature.

[28]  P. Fisher,et al.  mda-9/Syntenin: more than just a simple adapter protein when it comes to cancer metastasis. , 2008, Cancer research.

[29]  Maurice Bouwhuis,et al.  CoPub: a literature-based keyword enrichment tool for microarray data analysis , 2008, Nucleic Acids Res..

[30]  G. Schwartz,et al.  Targeting Checkpoint Kinase 1 in Cancer Therapeutics , 2007, Clinical Cancer Research.

[31]  Tanya Z. Berardini,et al.  The Arabidopsis Information Resource (TAIR): gene structure and function annotation , 2007, Nucleic Acids Res..