Assisted Curation: Does Text Mining Really Help?

Although text mining shows considerable promise as a tool for supporting the curation of biomedical text, there is little concrete evidence as to its effectiveness. We report on three experiments measuring the extent to which curation can be speeded up with assistance from Natural Language Processing (NLP), together with subjective feedback from curators on the usability of a curation tool that integrates NLP hypotheses for protein-protein interactions (PPIs). In our curation scenario, we found that a maximum speed-up of 1/3 in curation time can be expected if NLP output is perfectly accurate. The preference of one curator for consistent NLP output and output with high recall needs to be confirmed in a larger study with several curators.

[1]  Claire Grover,et al.  Rule-Based Chunking and Reusability , 2006, LREC.

[2]  Paul A. Viola,et al.  Interactive Information Extraction with Constrained Conditional Random Fields , 2004, AAAI.

[3]  Leif Arda Nielsen,et al.  Extracting Protein-Protein interactions using simple contextual features , 2006, BioNLP@NAACL-HLT.

[4]  B. Schwikowski,et al.  A network of protein–protein interactions in yeast , 2000, Nature Biotechnology.

[5]  Beatrice Alex,et al.  Recognising Nested Named Entities in Biomedical Text , 2007, BioNLP@ACL.

[6]  Carol Friedman,et al.  A Natural Language Processing (NLP) Tool to Assist in the Curation Of the Laboratory Mouse Tumor Biology Database , 2006, AMIA.

[7]  Marti A. Hearst,et al.  Exploring the Efficacy of Caption Search for Bioscience Journal Search Interfaces , 2007, BioNLP@ACL.

[8]  Joel D. Martin,et al.  PreBIND and Textomy – mining the biomedical literature for protein-protein interactions using a support vector machine , 2003, BMC Bioinformatics.

[9]  Barry Haddow,et al.  The Extraction of Enriched Protein-Protein Interactions from Biomedical Text , 2007, BioNLP@ACL.

[10]  D. Rebholz-Schuhmann,et al.  Facts from Text—Is Text Mining Ready to Deliver? , 2005, PLoS biology.

[11]  Xinglong Wang,et al.  Comparing Usability of Matching Techniques for Normalising Biomedical Named Entities , 2008, Pacific Symposium on Biocomputing.

[12]  Xinglong Wang Rule-Based Protein Term Identification with Help from Automatic Species Tagging , 2007, CICLing.

[13]  Ted Briscoe,et al.  Integrating Natural Language Processing with Flybase Curation , 2006, Pacific Symposium on Biocomputing.

[14]  James R. Curran,et al.  Language Independent NER using a Maximum Entropy Tagger , 2003, CoNLL.

[15]  Alexander A. Morgan,et al.  Evaluation of text data mining for database curation: lessons learned from the KDD Challenge Cup , 2003, ISMB.