BeCAS: biomedical concept recognition services and visualization

SUMMARY The continuous growth of the biomedical scientific literature has been motivating the development of text-mining tools able to efficiently process all this information. Although numerous domain-specific solutions are available, there is no web-based concept-recognition system that combines the ability to select multiple concept types to annotate, to reference external databases and to automatically annotate nested and intercepted concepts. BeCAS, the Biomedical Concept Annotation System, is an API for biomedical concept identification and a web-based tool that addresses these limitations. MEDLINE abstracts or free text can be annotated directly in the web interface, where identified concepts are enriched with links to reference databases. Using its customizable widget, it can also be used to augment external web pages with concept highlighting features. Furthermore, all text-processing and annotation features are made available through an HTTP REST API, allowing integration in any text-processing pipeline. AVAILABILITY BeCAS is freely available for non-commercial use at http://bioinformatics.ua.pt/becas. CONTACTS tiago.nunes@ua.pt or jlo@ua.pt.

[1]  Zhiyong Lu,et al.  PubMed and beyond: a survey of web tools for searching biomedical literature , 2011, Database J. Biol. Databases Curation.

[2]  Martijn J. Schuemie,et al.  A dictionary to identify small molecules and drugs in free text , 2009, Bioinform..

[3]  Jun'ichi Tsujii,et al.  Dependency Parsing and Domain Adaptation with LR Models and Parser Ensembles , 2007, EMNLP.

[4]  Michael Kuhn,et al.  Reflect: augmented browsing for the life scientist , 2009, Nature Biotechnology.

[5]  A. Valencia,et al.  A gene network for navigating the literature , 2004, Nature Genetics.

[6]  Zhiyong Lu,et al.  An improved corpus of disease mentions in PubMed citations , 2012, BioNLP@HLT-NAACL.

[7]  Olivier Bodenreider,et al.  The Unified Medical Language System (UMLS): integrating biomedical terminology , 2004, Nucleic Acids Res..

[8]  José Luís Oliveira,et al.  Gimli: open source and high-performance biomedical name recognition , 2013, BMC Bioinformatics.

[9]  Sampo Pyysalo,et al.  Open-domain Anatomical Entity Mention Detection , 2012, ACL 2012.

[10]  Dietrich Rebholz-Schuhmann,et al.  BioLexicon: A Lexical Resource for the Biology Domain , 2008, SMBM 2008.

[11]  K. Bretonnel Cohen,et al.  Concept annotation in the CRAFT corpus , 2012, BMC Bioinformatics.

[12]  Renata C. Geer,et al.  The NCBI BioSystems database , 2009, Nucleic Acids Res..

[13]  Dietrich Rebholz-Schuhmann,et al.  Text processing through Web services: calling Whatizit , 2008, Bioinform..