Mímir: An open-source semantic search framework for interactive information seeking and discovery

Semantic search is gradually establishing itself as the next generation search paradigm, which meets better a wider range of information needs, as compared to traditional full-text search. At the same time, however, expanding search towards document structure and external, formal knowledge sources (e.g. LOD resources) remains challenging, especially with respect to efficiency, usability, and scalability. This paper introduces Mimir—an open-source framework for integrated semantic search over text, document structure, linguistic annotations, and formal semantic knowledge. Mimir supports complex structural queries, as well as basic keyword search. Exploratory search and sense-making are supported through information visualisation interfaces, such as co-occurrence matrices and term clouds. There is also an interactive retrieval interface, where users can save, refine, and analyse the results of a semantic search over time. The more well-studied precision-oriented information seeking searches are also well supported. The generic and extensible nature of the Mimir platform is demonstrated through three different, real-world applications, one of which required indexing and search over tens of millions of documents and fifty to hundred times as many semantic annotations. Scaling up to over 150 million documents was also accomplished, via index federation and cloud-based deployment.

[1]  Daniel Schwabe,et al.  A hybrid approach for searching in the semantic web , 2004, WWW '04.

[2]  Hyoil Han,et al.  Survey of semantic annotation platforms , 2005, SAC '05.

[3]  Sebastiano Vigna,et al.  MG4J at TREC 2005 , 2005, TREC.

[4]  Wolfgang Nejdl,et al.  From keywords to semantic queries - Incremental query construction on the semantic web , 2009, J. Web Semant..

[5]  Susan B. Davidson,et al.  Designing and Evaluating an XPath Dialect for Linguistic Queries , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[6]  Ian Horrocks,et al.  OWL Web Ontology Language Reference-W3C Recommen-dation , 2004 .

[7]  Krisztian Balog,et al.  Overview of the TREC 2010 Entity Track , 2010, TREC.

[8]  Kalina Bontcheva,et al.  Semantic Annotations and Retrieval: Manual, Semiautomatic, and Automatic Generation , 2011, Handbook of Semantic Web Technologies.

[9]  Ellen M. Voorhees,et al.  Proceedings of the Fourteenth Text REtrieval Conference, TREC 2005, Gaithersburg, Maryland, USA, November 15-18, 2005 , 2005, NIST Special Publication.

[10]  Roi Blanco,et al.  Effective and Efficient Entity Search in RDF Data , 2011, SEMWEB.

[11]  Timos K. Sellis,et al.  Integrating Keywords and Semantics on Document Annotation and Search , 2010, OTM Conferences.

[12]  Paul Dixon,et al.  Oracle at Trec8: A Lexical Approach , 1999, TREC.

[13]  J. B. Brooke,et al.  SUS: A 'Quick and Dirty' Usability Scale , 1996 .

[14]  Lynda Hardman,et al.  /facet: A Browser for Heterogeneous Semantic Web Repositories , 2006, SEMWEB.

[15]  Ian Horrocks,et al.  How Incomplete Is Your Semantic Web Reasoner? , 2010, AAAI.

[16]  Otis Gospodnetic,et al.  Lucene in Action (In Action series) , 2004 .

[17]  Enrico Motta,et al.  Semantically enhanced Information Retrieval: An ontology-based approach , 2011, J. Web Semant..

[18]  Karen A. Loveland,et al.  LARGE SCALE , 1991 .

[19]  Jérôme Euzenat 2nd International Semantic Web Conference (ISWC 2003) , 2003 .

[20]  Frank van Harmelen,et al.  Sesame: A Generic Architecture for Storing and Querying RDF and RDF Schema , 2002, SEMWEB.

[21]  Kalina Bontcheva,et al.  Large-scale, parallel automatic patent annotation , 2008, PaIR '08.

[22]  Kalina Bontcheva,et al.  GATECloud.net: a platform for large-scale, open-source text processing on the cloud , 2013, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[23]  Marko Grobelnik,et al.  SemSearch'11: the 4th semantic search workshop , 2011, WWW.

[24]  Fausto Giunchiglia,et al.  Concept Search , 2009, ESWC.

[25]  Sihem Amer-Yahia,et al.  XML search: languages, INEX and scoring , 2006, SGMD.

[26]  Kalina Bontcheva,et al.  Improving habitability of natural language interfaces for querying ontologies with feedback and clarification dialogues , 2013, J. Web Semant..

[27]  Ben Shneiderman,et al.  The eyes have it: a task by data type taxonomy for information visualizations , 1996, Proceedings 1996 IEEE Symposium on Visual Languages.

[28]  Atanas Kiryakov,et al.  KIM - Semantic Annotation Platform , 2003, SEMWEB.

[29]  Kalina Bontcheva,et al.  GATE: an Architecture for Development of Robust HLT applications , 2002, ACL.

[30]  Dan Roth,et al.  Design Challenges and Misconceptions in Named Entity Recognition , 2009, CoNLL.

[31]  Atanas Kiryakov,et al.  Semantic Annotation, Indexing, and Retrieval , 2003, SEMWEB.

[32]  E. Prud hommeaux,et al.  SPARQL query language for RDF , 2011 .

[33]  Peter Pirolli Powers of 10: Modeling Complex Information-Seeking Systems at Multiple Scales , 2009, Computer.

[34]  Hannah Bast,et al.  A case for semantic full-text search , 2012, JIWES '12.

[35]  Hannah Bast,et al.  Broccoli: Semantic Full-Text Search at your Fingertips , 2012, ArXiv.

[36]  L. Stein,et al.  OWL Web Ontology Language - Reference , 2004 .

[37]  Soumen Chakrabarti,et al.  Keyword Search in Databases , 2007 .

[38]  Jens Lehmann,et al.  DBpedia - A crystallization point for the Web of Data , 2009, J. Web Semant..

[39]  Jeff Z. Pan,et al.  Resource Description Framework , 2020, Definitions.

[40]  Sihem Amer-Yahia,et al.  Texquery: a full-text search extension to xquery , 2004, WWW '04.

[41]  Yorick Wilks,et al.  Named Entity Recognition from Diverse Text Types , 2001 .

[42]  Christiane Fellbaum,et al.  Using Wordnet for Text Retrieval , 1998 .

[43]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[44]  Yun Peng,et al.  Swoogle: A semantic web search and metadata engine , 2004, CIKM 2004.

[45]  Jie Zhang,et al.  Semplore: An IR Approach to Scalable Hybrid Query of Semantic Web Data , 2007, ISWC/ASWC.

[46]  Timothy W. Finin,et al.  Swoogle: a search and metadata engine for the semantic web , 2004, CIKM '04.

[47]  Mark Dredze,et al.  Entity Linking: Finding Extracted Entities in a Knowledge Base , 2013, Multi-source, Multilingual Information Extraction and Summarization.

[48]  Amanda Spink,et al.  Searching the Web: the public and their queries , 2001 .

[49]  Peter Mika,et al.  Ad-hoc object retrieval in the web of data , 2010, WWW '10.

[50]  Davood Rafiei,et al.  Efficient Indexing and Querying over Syntactically Annotated Trees , 2012, Proc. VLDB Endow..

[51]  Stephen E. Robertson,et al.  Okapi at TREC-3 , 1994, TREC.

[52]  Stephen E. Robertson,et al.  GatfordCentre for Interactive Systems ResearchDepartment of Information , 1996 .

[53]  Robert G. Raskin,et al.  Knowledge representation in the semantic web for Earth and environmental terminology (SWEET) , 2005, Comput. Geosci..

[54]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[55]  Karen Spärck Jones A statistical interpretation of term specificity and its application in retrieval , 2021, J. Documentation.

[56]  Enrico Motta,et al.  SemSearch: A Search Engine for the Semantic Web , 2006, EKAW.

[57]  Jeremy J. Carroll,et al.  Resource description framework (rdf) concepts and abstract syntax , 2003 .

[58]  Scott Boag,et al.  XQuery 1.0 : An XML Query Language , 2007 .