Theophrastus: On demand and real-time automatic annotation and exploration of (web) documents using open linked data

Theophrastus is a system that supports the automatic annotation of (web) documents through entity mining and provides exploration services by exploiting Linked Open Data (LOD), in real-time and only when needed. The system aims at assisting biologists in their research on species and biodiversity. It was based on requirements coming from the biodiversity domain and was awarded the first prize in the Blue Hackathon 2013. Theophrastus has been designed to be highly configurable regarding a number of different aspects like entities of interest, information cards and external search systems. As a result it can be exploited in different contexts and other areas of interest. The provided experimental results show that the proposed approach is efficient and can be applied in real-time.

[1]  Timothy Clark,et al.  Open Annotation Data Model , 2013 .

[2]  Gerhard Weikum,et al.  YAGO2: exploring and querying world knowledge in time, space, context, and many languages , 2011, WWW.

[3]  Siegfried Handschuh Konduit VQB: a Visual Query Builder for SPARQL on the Social Semantic Desktop , 2010 .

[4]  Tim Clark,et al.  Open semantic annotation of scientific publications using DOMEO , 2012, J. Biomed. Semant..

[5]  Herbert Van de Sompel,et al.  Designing the W3C open annotation data model , 2013, WebSci.

[6]  Jens Lehmann,et al.  DBpedia: A Nucleus for a Web of Open Data , 2007, ISWC/ASWC.

[7]  Yannis Tzitzikas,et al.  Web Searching with Entity Mining at Query Time , 2012, IRFC.

[8]  Nigel Shadbolt,et al.  NITELIGHT: A Graphical Tool for Semantic Query Construction , 2008 .

[9]  Tim Berners-Lee,et al.  Linked Data - The Story So Far , 2009, Int. J. Semantic Web Inf. Syst..

[10]  Martin Doerr,et al.  Integrating Heterogeneous and Distributed Information about Marine Species through a Top Level Ontology , 2013, MTSR.

[11]  Christian Bizer,et al.  DBpedia spotlight: shedding light on the web of documents , 2011, I-Semantics '11.

[12]  N. Shah,et al.  NCBO Annotator: Semantic Annotation of Biomedical Data , 2009 .

[13]  Reinhard Schneider,et al.  OnTheFly: a tool for automated document-based text annotation, data linking and network generation , 2009, Bioinform..

[14]  Jürgen Umbrich,et al.  Hybrid SPARQL Queries: Fresh vs. Fast Results , 2012, SEMWEB.

[15]  Philip S. Yu,et al.  Dynamic Load Balancing on Web-Server Systems , 1999, IEEE Internet Comput..

[16]  Enrico Franconi,et al.  Quelo : a NL-based intelligent query interface , 2010 .

[17]  Steve Pettifer,et al.  Utopia documents: linking scholarly literature with research data , 2010, Bioinform..

[18]  Michael Kuhn,et al.  Reflect: augmented browsing for the life scientist , 2009, Nature Biotechnology.

[19]  Seán I O'Donoghue,et al.  Reflect: augmented browsing for the life scientist , 2009, Nature Biotechnology.

[20]  Leyla Jael García Castro,et al.  An open annotation ontology for science on web 3.0 , 2011, J. Biomed. Semant..

[21]  Dietrich Rebholz-Schuhmann,et al.  Text processing through Web services: calling Whatizit , 2008, Bioinform..

[22]  Gerhard Weikum,et al.  AIDA: An Online Tool for Accurate Disambiguation of Named Entities in Text and Tables , 2011, Proc. VLDB Endow..