CoST: An annotated Data Collection for Complex Search

While great progress is made in the area of information access, there are still open issues that involve designing intelligent systems supporting task-based search. Despite the importance of task-based search, the information retrieval and information science communities still feel the lack of open-ended and annotated datasets that enable the evaluation of a number of related facets of search tasks in downstream applications. Existing datasets are either sampled from large-scale logs but provide poor annotations, or sampled from lower-scale user studies but focus on ranked list evaluation. In this work, we present CoST: a novel richly annotated dataset for evaluating complex search tasks, collaboratively designed by researchers from the computer science and cognitive psychology domains, and intended to answer a wide range of research questions dealing with task-based search. CoST includes 5667 queries recorded in 630 task-based sessions that result from a user study involving 70 french native participants who are expert in one among 3 different domains of expertise (computer science, medicine, psychology). Each participant completed 15 tasks with 5 different types of cognitive complexity (fact-finding, exploratory learning, decision-making, problem-solving, multicriteria-inferential). In addition to search data (e.g., queries and clicks), CoST provides task and session-related data, task annotations and query annotations. We illustrate possible usages of CoST through the evaluation of query classification models and the understanding of the effect of task complexity and domain on user's search behavior.

[1]  Preben Hansen,et al.  Conceptual framework for tasks in information studies , 2005, J. Assoc. Inf. Sci. Technol..

[2]  Gary Marchionini,et al.  Exploratory search , 2006, Commun. ACM.

[3]  Matthias Hagen,et al.  From search session detection to search mission detection , 2013, OAIR.

[4]  Ben Carterette,et al.  TREC 2017 Tasks Track Overview , 2017, TREC.

[5]  Luanne Freund,et al.  Workshop on Barriers to Interactive IR Resources Re-use (BIIRRR 2019) , 2019, CHIIR.

[6]  Kevyn Collins-Thompson,et al.  Towards searching as a learning process: A review of current perspectives and future directions , 2016, J. Inf. Sci..

[7]  Matthias Hagen,et al.  Query-Task Mapping , 2019, SIGIR.

[8]  Maarten de Rijke,et al.  Lexical Query Modeling in Session Search , 2016, ICTIR.

[9]  Chirag Shah,et al.  Searching as Learning: Exploring Search Behavior and Learning Outcomes in Learning-related Tasks , 2018, CHIIR.

[10]  James Allan,et al.  Understanding Ephemeral State of Relevance , 2017, CHIIR.

[11]  D. Campbell Task Complexity: A Review and Analysis , 1988 .

[12]  Ben Carterette,et al.  Evaluating Retrieval over Sessions: The TREC Session Track 2011-2014 , 2016, SIGIR.

[13]  Paul Over,et al.  The TREC interactive track: an annotated bibliography , 2001, Inf. Process. Manag..

[14]  Soohyung Joo,et al.  Factors that influence query reformulations and search performance in health information retrieval: A multilevel modeling approach , 2017, J. Assoc. Inf. Sci. Technol..

[15]  Jaime Arguello,et al.  Grannies, tanning beds, tattoos and NASCAR: evaluation of search tasks with varying levels of cognitive complexity , 2012, IIiX.

[16]  Marc-Allen Cartright,et al.  Intentions and attention in exploratory health search , 2011, SIGIR.

[17]  Lynda Tamine,et al.  On the impact of domain expertise on query formulation, relevance assessment and retrieval performance in clinical settings , 2017, Inf. Process. Manag..

[18]  Ian Ruthven,et al.  Searcher's Assessments of Task Complexity for Web Searching , 2004, ECIR.

[19]  Fabrizio Silvestri,et al.  Identifying task-based sessions in search engine query logs , 2011, WSDM '11.

[20]  Aline Chevalier,et al.  Query strategies during information searching: Effects of prior domain knowledge and complexity of the information problems to be solved , 2015, Inf. Process. Manag..

[21]  Naveen Arivazhagan,et al.  Language-agnostic BERT Sentence Embedding , 2020, ArXiv.

[22]  Franck Amadieu,et al.  An Evolving Perspective to Capture Individual Differences Related to Fluid and Crystallized Abilities in Information Searching with a Search Engine , 2020 .

[23]  Laurent Romary,et al.  CamemBERT: a Tasty French Language Model , 2019, ACL.

[24]  Gareth J. F. Jones,et al.  Tempo-Lexical Context Driven Word Embedding for Cross-Session Search Task Extraction , 2018, NAACL-HLT.

[25]  Grace Hui Yang,et al.  Overview of the CLEF Dynamic Search Evaluation Lab 2018 , 2018, CLEF.

[26]  Nicholas J. Belkin,et al.  Second Workshop on Supporting Complex Search Tasks , 2017, CHIIR.

[27]  Aline Chevalier,et al.  How do older and young adults start searching for information? Impact of age, domain knowledge and problem complexity on the different steps of information searching , 2017, Comput. Hum. Behav..

[28]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[29]  Pertti Vakkari,et al.  Task complexity, problem structure and information actions - Integrating studies on information seeking and retrieval , 1999, Inf. Process. Manag..

[30]  Shawon Sarkar,et al.  Identifying and Predicting the States of Complex Search Tasks , 2020, CHIIR.

[31]  Hua Ouyang,et al.  Learning to Rewrite Queries , 2016, CIKM.

[32]  Barbara M. Wildemuth,et al.  Examining the impact of domain and cognitive complexity on query formulation and reformulation , 2018, Inf. Process. Manag..

[33]  Ryen W. White,et al.  Struggling and Success in Web Search , 2015, CIKM.

[34]  Pia Borlund,et al.  An investigation of the search behaviour associated with Ingwersen's three types of information needs , 2014, Inf. Process. Manag..

[35]  Amanda Spink,et al.  Patterns of query reformulation during Web searching , 2009, J. Assoc. Inf. Sci. Technol..

[36]  Aurélie Dommes,et al.  The Role of Cognitive Flexibility and Vocabulary Abilities of Younger and Older Users in Searching for Information on the Web , 2011 .

[37]  Ryen W. White,et al.  Task Intelligence Workshop @ WSDM 2019 , 2019, WSDM.

[38]  Ryen W. White,et al.  Supporting Complex Search Tasks , 2014, CIKM.

[39]  Jose G. Moreno,et al.  Extracting Search Tasks from Query Logs Using a Recurrent Deep Clustering Architecture , 2021, ECIR.

[40]  Robert G. Capra,et al.  The Effects of Task Complexity on the Use of Different Types of Information in a Search Assistance Tool , 2019, ACM Trans. Inf. Syst..

[41]  Wai-Tat Fu,et al.  Searching for information on the web: Impact of cognitive aging, prior domain knowledge and complexity of the search problems , 2017, Inf. Process. Manag..