Nucleic Acids Research Polysearch: a Web-based Text Mining System for Extracting Relationships between Human Diseases, Genes, Mutations, Drugs Polysearch: a Web-based Text Mining System for Extracting Relationships between Human Diseases, Genes, Mutations, Drugs and Metabolites

A particular challenge in biomedical text mining is to find ways of handling ‘comprehensive’ or ‘associative’ queries such as ‘Find all genes associated with breast cancer’. Given that many queries in genomics, proteomics or metabolomics involve these kind of comprehensive searches we believe that a web-based tool that could support these searches would be quite useful. In response to this need, we have developed the PolySearch web server. PolySearch supports >50 different classes of queries against nearly a dozen different types of text, scientific abstract or bioinformatic databases. The typical query supported by PolySearch is ‘Given X, find all Y's’ where X or Y can be diseases, tissues, cell compartments, gene/protein names, SNPs, mutations, drugs and metabolites. PolySearch also exploits a variety of techniques in text mining and information retrieval to identify, highlight and rank informative abstracts, paragraphs or sentences. PolySearch's performance has been assessed in tasks such as gene synonym identification, protein–protein interaction identification and disease gene identification using a variety of manually assembled ‘gold standard’ text corpuses. Its f-measure on these tasks is 88, 81 and 79%, respectively. These values are between 5 and 50% better than other published tools. The server is freely available at http://wishart.biology.ualberta.ca/polysearch

[1]  David S. Wishart,et al.  DrugBank: a comprehensive resource for in silico drug discovery and exploration , 2005, Nucleic Acids Res..

[2]  Dietrich Rebholz-Schuhmann,et al.  EBIMed - text crunching to gather facts for proteins from Medline , 2007, Bioinform..

[3]  Gudmundur A. Thorisson,et al.  The International HapMap Project Web site. , 2005, Genome research.

[4]  P. Stenson,et al.  Human Gene Mutation Database (HGMD®): 2003 update , 2003, Human mutation.

[5]  Ulf Leser,et al.  ALIBABA: PubMed as a graph , 2006, Bioinform..

[6]  A Bairoch,et al.  SWISS-PROT: connecting biomolecular knowledge via a protein database. , 2001, Current issues in molecular biology.

[7]  N. Campbell Genetic association database , 2004, Nature Reviews Genetics.

[8]  M. Rivera,et al.  Analysis of genomic and proteomic data using advanced literature mining. , 2003, Journal of proteome research.

[9]  Ying Zhang,et al.  HMDB: the Human Metabolome Database , 2007, Nucleic Acids Res..

[10]  Linda A. Watson,et al.  Information Retrieval: A Health and Biomedical Perspective. , 2005 .

[11]  R. Strausberg,et al.  Genome and genetic resources from the Cancer Genome Anatomy Project. , 2001, Human molecular genetics.

[12]  P. Stenson,et al.  Human Gene Mutation Database (HGMD , 2003 .

[13]  David L. Steffen,et al.  The Breast Cancer Gene Database: a collaborative information resource , 1999, Oncogene.

[14]  L Hunter,et al.  MedMiner: an Internet text-mining tool for biomedical information, with application to gene expression profiling. , 1999, BioTechniques.

[15]  Betsy L. Humphreys,et al.  Technical Milestone: The Unified Medical Language System: An Informatics Research Collaboration , 1998, J. Am. Medical Informatics Assoc..

[16]  Hiroyuki Ogata,et al.  KEGG: Kyoto Encyclopedia of Genes and Genomes , 1999, Nucleic Acids Res..

[17]  Hao Yu,et al.  Discovering patterns to extract protein-protein interactions from full texts , 2004, Bioinform..

[18]  Alfonso Valencia,et al.  Implementing the iHOP concept for navigation of biomedical literature , 2005, ECCB/JBI.

[19]  Alan F. Scott,et al.  Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders , 2002, Nucleic Acids Res..

[20]  Hao Yu,et al.  Discovering patterns to extract protein-protein interactions from the literature: Part II , 2005, Bioinform..

[21]  Thomas Werner,et al.  LitMiner and WikiGene: identifying problem-related key players of gene regulation using publication abstracts , 2005, Nucleic Acids Res..

[22]  Sue Povey,et al.  Genew: the Human Gene Nomenclature Database , 2002, Nucleic Acids Res..