Ranking Archived Documents for Structured Queries on Semantic Layers

Archived collections of documents (like newspaper and web archives) serve as important information sources in a variety of disciplines, including Digital Humanities, Historical Science, and Journalism. However, the absence of efficient and meaningful exploration methods still remains a major hurdle in the way of turning them into usable sources of information. A semantic layer is an RDF graph that describes metadata and semantic information about a collection of archived documents, which in turn can be queried through a semantic query language (SPARQL). This allows running advanced queries by combining metadata of the documents (like publication date) and content-based semantic information (like entities mentioned in the documents). However, the results returned by such structured queries can be numerous and moreover they all equally match the query. In this paper, we deal with this problem and formalize the task of ranking archived documents for structured queries on semantic layers. Then, we propose two ranking models for the problem at hand which jointly consider: i) the relativeness of documents to entities, ii) the timeliness of documents, and iii) the temporal relations among the entities. The experimental results on a new evaluation dataset show the effectiveness of the proposed models and allow us to understand their limitations.

[1]  Roi Blanco,et al.  Evaluating ad-hoc object retrieval , 2010, IWEST@ISWC.

[2]  Jaana Kekäläinen,et al.  Cumulated gain-based evaluation of IR techniques , 2002, TOIS.

[3]  Gianluca Demartini,et al.  Combining inverted indices and structured search for ad-hoc object retrieval , 2012, SIGIR '12.

[4]  Fuchun Peng,et al.  Improving search relevance for implicitly temporal queries , 2009, SIGIR.

[5]  Wolfgang Nejdl,et al.  Building and querying semantic layers for web archives (extended version) , 2017, 2017 ACM/IEEE Joint Conference on Digital Libraries (JCDL).

[6]  Claudia Niederée,et al.  Beyond Time: Dynamic Context-Aware Entity Recommendation , 2017, ESWC.

[7]  Axel-Cyrille Ngonga Ngomo,et al.  Holistic and scalable ranking of RDF data , 2017, 2017 IEEE International Conference on Big Data (Big Data).

[8]  Irem Arikan,et al.  Time Will Tell: Leveraging Temporal Expressions in IR , 2009, WSDM.

[9]  Mohsen Kahani,et al.  COLINA: A Method for Ranking SPARQL Query Results through Content and Link Analysis , 2014, International Semantic Web Conference.

[10]  Din J. Wasem,et al.  Mining of Massive Datasets , 2014 .

[11]  Peter Mika,et al.  Ad-hoc object retrieval in the web of data , 2010, WWW '10.

[12]  Anand Rajaraman,et al.  Mining of Massive Datasets , 2011 .

[13]  Jürgen Umbrich,et al.  Searching and browsing Linked Data with SWSE: The Semantic Web Search Engine , 2011, J. Web Semant..

[14]  Yannis Tzitzikas,et al.  Faceted exploration of RDF/S datasets: a survey , 2017, Journal of Intelligent Information Systems.

[15]  Mohammadali Nematbakhsh,et al.  Query-independent learning to rank RDF entity results of SPARQL queries , 2014, 2014 4th International Conference on Computer and Knowledge Engineering (ICCKE).

[16]  Gerhard Weikum,et al.  A Language Modeling Approach for Temporal Information Needs , 2010, ECIR.

[17]  Peiquan Jin,et al.  TISE: A Temporal Search Engine for Web Contents , 2008, 2008 Second International Symposium on Intelligent Information Technology Application.

[18]  Luis Gravano,et al.  Answering General Time-Sensitive Queries , 2012, IEEE Trans. Knowl. Data Eng..

[19]  Jiawei Han,et al.  Entity Linking with a Knowledge Base: Issues, Techniques, and Solutions , 2015, IEEE Transactions on Knowledge and Data Engineering.

[20]  Evgeniy Gabrilovich,et al.  Using the past to score the present: extending term weighting models through revision history analysis , 2010, CIKM.

[21]  Yannis Tzitzikas,et al.  Post-analysis of Keyword-Based Search Results Using Entity Mining, Linked Data, and Link Analysis at Query Time , 2014, 2014 IEEE International Conference on Semantic Computing.

[22]  Miguel-Ángel Sicilia,et al.  A survey of approaches for ranking on the web of data , 2014, Information Retrieval.

[23]  Wolfgang Nejdl,et al.  History by Diversity: Helping Historians search News Archives , 2016, CHIIR.

[24]  Avishek Anand,et al.  Tempas: Temporal Archive Search Based on Tags , 2016, WWW.

[25]  Gerhard Weikum,et al.  Robust Disambiguation of Named Entities in Text , 2011, EMNLP.

[26]  P. Sreenivasa Kumar,et al.  SPRING: Ranking the results of SPARQL queries on Linked Data , 2011, COMAD.

[27]  Ricardo Campos,et al.  Survey of Temporal Information Retrieval and Related Applications , 2014, ACM Comput. Surv..

[28]  Roberto Navigli,et al.  Entity Linking meets Word Sense Disambiguation: a Unified Approach , 2014, TACL.

[29]  Wolfgang Nejdl,et al.  Expedition: A Time-Aware Exploratory Search System Designed for Scholars , 2016, SIGIR.

[30]  Wolfgang Nejdl,et al.  Exploring Web Archives Through Temporal Anchor Texts , 2017, WebSci.

[31]  Ji Zhang,et al.  A Probabilistic Model for Time-Aware Entity Recommendation , 2016, SEMWEB.

[32]  Sébastien Ferré,et al.  SPARKLIS: a SPARQL Endpoint Explorer for Expressive Question Answering , 2014, SEMWEB.

[33]  Roi Blanco,et al.  Temporal Information Retrieval , 2015, Found. Trends Inf. Retr..

[34]  Evgeny Kharlamov,et al.  SemFacet: semantic faceted search over yago , 2014, WWW.

[35]  Yannis Tzitzikas,et al.  Stochastic reranking of biomedical search results based on extracted entities , 2017, J. Assoc. Inf. Sci. Technol..

[36]  Kjetil Nørvåg,et al.  Learning to rank search results for time-sensitive queries , 2012, CIKM.

[37]  Giovanni Tummarello,et al.  Effective Retrieval Model for Entity with Multi-valued Attributes: BM25MF and Beyond , 2012, EKAW.

[38]  Su-shan Chin Exploring Digital Libraries: Foundations, Practice, Prospects , 2014 .

[39]  悠太 菊池,et al.  大規模要約資源としてのNew York Times Annotated Corpus , 2015 .

[40]  Mario Cataldi,et al.  Emerging topic detection on Twitter based on temporal and social terms evaluation , 2010, MDMKDD '10.

[41]  Krisztian Balog,et al.  Overview of the TREC 2010 Entity Track , 2010, TREC.