SIA: a scalable interoperable annotation server for biomedical named entities

Recent years showed a strong increase in biomedical sciences and an inherent increase in publication volume. Extraction of specific information from these sources requires highly sophisticated text mining and information extraction tools. However, the integration of freely available tools into customized workflows is often cumbersome and difficult. We describe SIA (Scalable Interoperable Annotation Server), our contribution to the BeCalm-Technical interoperability and performance of annotation servers (BeCalm-TIPS) task, a scalable, extensible, and robust annotation service. The system currently covers six named entity types (i.e., chemicals, diseases, genes, miRNA, mutations, and organisms) and is freely available under Apache 2.0 license at https://github.com/Erechtheus/sia.

[1]  Ulf Leser,et al.  Potential and Pitfalls of Domain-Specific Information Extraction at Web Scale , 2016, SIGMOD Conference.

[2]  Graciela Gonzalez,et al.  BANNER: An Executable Survey of Advances in Biomedical Named Entity Recognition , 2007, Pacific Symposium on Biocomputing.

[3]  Felix Naumann,et al.  The Stratosphere platform for big data analytics , 2014, The VLDB Journal.

[4]  Bobby Woolf,et al.  Enterprise Integration Patterns , 2003 .

[5]  Alfonso Valencia,et al.  Benchmarking biomedical text mining web servers at BioCreative V.5: the technical Interoperability and Performance of annotation Servers - TIPS track , 2017 .

[6]  Goran Nenadic,et al.  LINNAEUS: A species name identification system for biomedical literature , 2010, BMC Bioinformatics.

[7]  K. Cohen,et al.  Biomedical language processing: what's beyond PubMed? , 2006, Molecular cell.

[8]  Olivier Bodenreider,et al.  The Unified Medical Language System (UMLS): integrating biomedical terminology , 2004, Nucleic Acids Res..

[9]  D. Lipman,et al.  National Center for Biotechnology Information , 2019, Springer Reference Medizin.

[10]  Ulf Leser,et al.  ChemSpot: a hybrid system for chemical named entity recognition , 2012, Bioinform..

[11]  Ulf Leser,et al.  Experiences from Developing the Domain-Specific Entity Search Engine GeneView , 2013, BTW.

[12]  S C Kleene,et al.  Representation of Events in Nerve Nets and Finite Automata , 1951 .

[13]  Ulf Leser,et al.  SETH detects and normalizes genetic variants in text , 2016, Bioinform..

[14]  Scott Shenker,et al.  Spark: Cluster Computing with Working Sets , 2010, HotCloud.

[15]  Jose M. Alcaraz Calero,et al.  Towards an architecture for deploying elastic services in the cloud , 2012, Softw. Pract. Exp..

[16]  Karin M. Verspoor,et al.  BioC: a minimalist approach to interoperability for biomedical text processing , 2013, AMIA.

[17]  Zhiyong Lu,et al.  DNorm: disease name normalization with pairwise learning to rank , 2013, Bioinform..

[18]  Alfred V. Aho,et al.  Efficient string matching , 1975, Commun. ACM.

[19]  G. Ruvkun,et al.  A uniform system for microRNA annotation. , 2003, RNA.