SOA-Based Integration of Text Mining Services

Text Mining has established itself as a valuable tool for knowledge extraction in many commercial and scientific areas. Accordingly, a large number of different methods have been developed focusing on a broad range of different tasks. We report on a novel system architecture that is fundamentally service-based, i.e., it models and implements text mining and knowledge extraction routines as independent, yet federated services. The system has several layers: (1) Base services perform various fundamental extraction tasks. They all implement a fixed interface but keep their particular algorithms and functionality. (2) A metaservice acting as a central access point to those base services, thus providing a homogeneous interface to different algorithms. (3) An aggregation service on top of the metaservice which implements functionality to graphically show, compare, and aggregate the results of different base services. Each layer is accessible as a Web Service and thus ready to be integrated in applications that are higher up in the value chain, such as authoring tools or systems for the automatic construction of knowledge bases. We developed our system with a focus on the mining of Life Science text collections. It is available from http://www.bc-viscon.net.

[1]  Deyu Zhou,et al.  Methodological Review: Extracting interactions between proteins from the literature , 2008 .

[2]  Ulf Leser,et al.  What makes a gene name? Named entity recognition in the biomedical literature , 2005, Briefings Bioinform..

[3]  Luís Ferreira Pires,et al.  Enterprise interoperability with SOA: a survey of service composition approaches , 2008, 2008 12th Enterprise Distributed Object Computing Conference Workshops.

[4]  David B. Searls,et al.  Literature mining in support of drug discovery , 2008, Briefings Bioinform..

[5]  Alfonso Valencia,et al.  Overview of BioCreAtIvE: critical assessment of information extraction for biology , 2005, BMC Bioinformatics.

[6]  David A. Ferrucci,et al.  UIMA: an architectural approach to unstructured information processing in the corporate research environment , 2004, Natural Language Engineering.

[7]  Daniel Bachlechner,et al.  Semantic Web Service Research: Current Challenges and Proximate Achievements , 2008, Int. J. Comput. Sci. Appl..

[8]  Jeffrey Augen Information technology to the rescue! , 2001, Nature Biotechnology.

[9]  Luana Licata,et al.  Linking entries in protein interaction database to structured text: The FEBS Letters experiment , 2008, FEBS letters.

[10]  Chris Sander,et al.  Introducing meta-services for biomedical information extraction , 2008, Genome Biology.

[11]  Michael Schroeder,et al.  Facts from text: can text mining help to scale-up high-quality manual curation of gene products with ontologies? , 2008, Briefings Bioinform..

[12]  A. Valencia,et al.  Evaluation of text-mining systems for biology: overview of the Second BioCreative community challenge , 2008, Genome Biology.

[13]  Juliane Fluck,et al.  Information extraction technologies for the life science industry. , 2005, Drug discovery today. Technologies.

[14]  William R. Hersh,et al.  A survey of current work in biomedical text mining , 2005, Briefings Bioinform..

[15]  A. Valencia,et al.  A text‐mining perspective on the requirements for electronically annotated abstracts , 2008, FEBS letters.

[16]  Arne Elofsson,et al.  Structure prediction meta server , 2001, Bioinform..

[17]  Weiss,et al.  Text Mining , 2010 .

[18]  Alfonso Valencia,et al.  Information extraction in molecular biology , 2002, Briefings Bioinform..