Archived collections of documents (like newspaper archives) serve as important information sources for historians, journalists, sociologists and other interested parties. Semantic Layers over such digital archives allow describing and publishing metadata and semantic information about the archived documents in a standard format (RDF), which in turn can be queried through a structured query language (e.g., SPARQL). This enables to run advanced queries by combining metadata of the documents (like publication date) and content-based semantic information (like entities mentioned in the documents). However, the results returned by structured queries can be numerous and also they all equally match the query. Thus, there is the need to rank these results in order to promote the most important ones. In this paper, we focus on this problem and propose a ranking model that considers and combines: i) the relativeness of documents to entities, ii) the timeliness of documents, and iii) the relations among the entities.
[1]
Din J. Wasem,et al.
Mining of Massive Datasets
,
2014
.
[2]
Gemma Siemensma.
Exploring Digital Libraries: Foundations, Practice, Prospects
,
2014,
Online Inf. Rev..
[3]
Anand Rajaraman,et al.
Mining of Massive Datasets
,
2011
.
[4]
Yannis Tzitzikas,et al.
Faceted exploration of RDF/S datasets: a survey
,
2017,
Journal of Intelligent Information Systems.
[5]
Wolfgang Nejdl,et al.
Building and querying semantic layers for web archives (extended version)
,
2017,
2017 ACM/IEEE Joint Conference on Digital Libraries (JCDL).