Biomedical text summarization to support genetic database curation: using Semantic MEDLINE to create a secondary database of genetic information.

OBJECTIVE This paper examines the development and evaluation of an automatic summarization system in the domain of molecular genetics. The system is a potential component of an advanced biomedical information management application called Semantic MEDLINE and could assist librarians in developing secondary databases of genetic information extracted from the primary literature. METHODS An existing summarization system was modified for identifying biomedical text relevant to the genetic etiology of disease. The summarization system was evaluated on the task of identifying data describing genes associated with bladder cancer in MEDLINE citations. A gold standard was produced using records from Genetics Home Reference and Online Mendelian Inheritance in Man. Genes in text found by the system were compared to the gold standard. Recall, precision, and F-measure were calculated. RESULTS The system achieved recall of 46%, and precision of 88% (F-measure=0.61) by taking Gene References into Function (GeneRIFs) into account. CONCLUSION The new summarization schema for genetic etiology has potential as a component in Semantic MEDLINE to support the work of data curators.

[1]  Phoebe M. Roberts,et al.  Mining literature for systems biology , 2006, Briefings Bioinform..

[2]  Di Liu,et al.  DATF: a database of Arabidopsis transcription factors , 2005, Bioinform..

[3]  Kathleen B. Digre,et al.  Neuro-Ophthalmology Virtual Education Library (NOVEL: http://NOVEL.utah.edu/) , 2010 .

[4]  P. Bork,et al.  Literature mining for the biologist: from information retrieval to biological discovery , 2006, Nature Reviews Genetics.

[5]  Halil Kilicoglu,et al.  Semantic MEDLINE: A web application for managing the results of PubMed searches , 2008, SMBM 2008.

[6]  Ann Koopman,et al.  Feeding the Fledgling Repository: Starting an Institutional Repository at an Academic Health Sciences Library , 2009, Medical reference services quarterly.

[7]  Hanno Steen,et al.  Development of human protein reference database as an initial platform for approaching systems biology in humans. , 2003, Genome research.

[8]  Halil Kilicoglu,et al.  Semantic Relations Asserting the Etiology of Genetic Diseases , 2003, AMIA.

[9]  Marcelo Fiszman,et al.  Extracting Semantic Predications from Medline Citations for Pharmacogenomics , 2006, Pacific Symposium on Biocomputing.

[10]  Halil Kilicoglu,et al.  Abstraction Summarization for Managing the Biomedical Research Literature , 2004, HLT-NAACL 2004.

[11]  Joyce A. Mitchell,et al.  Gene Indexing: Characterization and Analysis of NLM's GeneRIFs , 2003, AMIA.

[12]  V. McKusick Mendelian inheritance in man , 1971 .

[13]  Alan R. Aronson,et al.  Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program , 2001, AMIA.

[14]  Alan F. Scott,et al.  McKusick's Online Mendelian Inheritance in Man (OMIM®) , 2008, Nucleic Acids Res..

[15]  Marcelo Fiszman,et al.  Semantic Interpretation for the Biomedical Research Literature , 2005 .

[16]  Alexa T. McCray,et al.  Application of Information Technology: Design of Genetics Home Reference: A New NLM Consumer Health Resource , 2004, J. Am. Medical Informatics Assoc..

[17]  Stuart Macdonald,et al.  Libraries in the Converging Worlds of Open Data, E-research, and Web 2.0 , 2008 .

[18]  Ge Gao,et al.  DRTF: a database of rice transcription factors , 2006, Bioinform..

[19]  Stanley Letovsky,et al.  Bioinformatics: Databases and Systems , 2013, Springer US.

[20]  Halil Kilicoglu,et al.  Summarizing Drug Information in Medline Citations , 2006, AMIA.

[21]  Marcelo Fiszman,et al.  The interaction of domain knowledge and linguistic structure in natural language processing: interpreting hypernymic propositions in biomedical text , 2003, J. Biomed. Informatics.

[22]  Halil Kilicoglu,et al.  Automatic summarization of MEDLINE citations for evidence-based medical treatment: A topic-oriented evaluation , 2009, J. Biomed. Informatics.

[23]  Halil Kilicoglu,et al.  Using Natural Language Processing, LocusLink and the Gene Ontology to Compare OMIM to MEDLINE , 2004, HLT-NAACL 2004.

[24]  D. Lindberg,et al.  The Unified Medical Language System , 1993, Methods of Information in Medicine.

[25]  David R Davies,et al.  The type 1 insulin‐like growth factor receptor is over‐expressed in bladder cancer , 2007, BJU international.

[26]  Alexa T. McCray,et al.  The Genetics Home Reference: A New NLM Consumer Health Resource , 2003, AMIA.

[27]  Kathleen B. Digre,et al.  The Neuro-Ophthalmology Virtual Educational Library (NOVEL) , 2007 .