Characteristics of Dataset Retrieval Sessions: Experiences from a Real-life Digital Library

Secondary analysis or the reuse of existing survey data is a common practice among social scientists. Searching for relevant datasets in Digital Libraries is a somehow unfamiliar behaviour for this community. Dataset retrieval, especially in the social sciences, incorporates additional material such as codebooks, questionnaires, raw data files and more. Our assumption is that due to the diverse nature of datasets, document retrieval models often do not work as efficiently for retrieving datasets. One way of enhancing these types of searches is to incorporate the users' interaction context in order to personalise dataset retrieval sessions. As a first step towards this long term goal, we study characteristics of dataset retrieval sessions from a real-life Digital Library for the social sciences that incorporates both: research data and publications. Previous studies reported a way of discerning queries between document search and dataset search by query length. In this paper, we argue the claim and report our findings of an indistinguishability of queries, whether aiming for a dataset or a document. Amongst others, we report our findings of dataset retrieval sessions with respect to query characteristics, interaction sequences and topical drift within 65,000 unique sessions.

[1]  Brigitte Mathiak,et al.  Are There Any Differences in Data Set Retrieval Compared to Well-Known Literature Retrieval? , 2015, TPDL.

[2]  Nick Koudas,et al.  Efficient diversity-aware search , 2011, SIGMOD '11.

[3]  Daniel Hienert,et al.  A Usefulness-based Approach for Measuring the Local and Global Effect of IIR Services , 2016, CHIIR.

[4]  Amanda Spink,et al.  How are we searching the World Wide Web? A comparison of nine search engine transaction logs , 2006, Inf. Process. Manag..

[5]  Katarina Boland,et al.  A Digital Library for Research Data and Related Information in the Social Sciences , 2019, 2019 ACM/IEEE Joint Conference on Digital Libraries (JCDL).

[6]  E. F. Codd,et al.  Relational Completeness of Data Base Sublanguages , 1972, Research Report / RJ / IBM / San Jose, California.

[7]  Dan Brickley,et al.  Google Dataset Search: Building a search engine for datasets in an open Web ecosystem , 2019, WWW.

[8]  Elena Paslaru Bontas Simperl,et al.  Characterising Dataset Search Queries , 2018, WWW.

[9]  Panos Constantopoulos,et al.  Research and Advanced Technology for Digital Libraries , 2001, Lecture Notes in Computer Science.

[10]  Evgeny Kharlamov,et al.  Towards More Usable Dataset Search: From Query Characterization to Snippet Generation , 2019, CIKM.

[11]  Elena Simperl,et al.  Dataset search: a survey , 2019, The VLDB Journal.

[12]  Maarten de Rijke,et al.  Report on the DATA: SEARCH'18 workshop - Searching Data on the Web , 2019, SIGF.

[13]  Jayant Madhavan,et al.  Structured Data on the Web , 2009, 2010 12th International Asia-Pacific Web Conference.