Automatic Extraction of Genomic Glossary Triggered by Query

In the domain of genomic research, the understanding of specific gene name is a portal to most Information Retrieval (IR) and Information Extraction (IE) systems. In this paper we present an automatic method to extract genomic glossary triggered by the initial gene name in query. LocusLink gene names and MEDLINE abstracts are employed in our system, playing the roles of query triggers and genomic corpus respectively. The evaluation of the extracted glossary is through query expansion in TREC2003 Genomics Track ad hoc retrieval task, and the experiment results yield evidence that 90.15% recall can be achieved.

[1]  Sydney Brenner,et al.  A uniform genetic nomenclature for the nematode Caenorhabditis elegans , 1979, Molecular and General Genetics MGG.

[2]  Xian Zhang,et al.  THUIR at TREC 2004: Genomics Track , 2004, TREC.

[3]  Donald G. Gilbert,et al.  euGenes: a eukaryote genome information system , 2002, Nucleic Acids Res..

[4]  Kaoru Yamamoto,et al.  Utilizing weakly controlled vocabulary for sentence segmentation in biomedical literature , 2004, Silico Biol..

[5]  Russ B. Altman,et al.  Research Paper: Creating an Online Dictionary of Abbreviations from MEDLINE , 2002, J. Am. Medical Informatics Assoc..

[6]  Betsy L. Humphreys,et al.  Technical Milestone: The Unified Medical Language System: An Informatics Research Collaboration , 1998, J. Am. Medical Informatics Assoc..

[7]  Jung-Hsien Chiang,et al.  MeKE: Discovering the Functions of Gene Products from Biomedical Literature Via Sentence Alignment , 2003, Bioinform..

[8]  Hong Yu,et al.  Automatically identifying gene/protein terms in MEDLINE abstracts , 2002, J. Biomed. Informatics.

[9]  L J Maltais,et al.  Rules and guidelines for mouse gene nomenclature: a condensed version. International Committee on Standardized Genetic Nomenclature for Mice. , 1997, Genomics.

[10]  Smaranda Muresan,et al.  Evaluation of the DEFINDER system for fully automatic glossary construction , 2001, AMIA.

[11]  William R. Hersh,et al.  TREC GENOMICS Track Overview , 2003, TREC.

[12]  Alexander A. Morgan,et al.  Background and overview for KDD Cup 2002 task 1: information extraction from biomedical articles , 2002, SKDD.

[13]  Wei Luo,et al.  Medstract: creating large-scale information servers from biomedical texts , 2002, ACL Workshop on Natural Language Processing in the Biomedical Domain.

[14]  James Pustejovskya,et al.  Linguistic Knowledge Extraction from Medline: Automatic Construction of an Acronym Database , 2001 .

[15]  J. Pustejovsky,et al.  Medstract : Creating Large-scale Information Servers for biomedical libraries , 2002 .

[16]  Mathew W. Wright,et al.  Guidelines for human gene nomenclature. , 2002, Genomics.

[17]  S. Antonarakis Recommendations for a nomenclature system for human gene mutations , 1998 .

[18]  D. Lindberg,et al.  The Unified Medical Language System , 1993, Methods of Information in Medicine.

[19]  Smaranda Muresan,et al.  Evaluation of DEFINDER: a system to mine definitions from consumer-oriented medical text , 2001, JCDL '01.

[20]  K. Katz,et al.  Introducing RefSeq and LocusLink: curated human genome resources at the NCBI. , 2000, Trends in genetics : TIG.

[21]  J. Kohli,et al.  Genetic nomenclature and gene list of the fission yeast Schizosaccharomyces pombe , 2004, Current Genetics.