论文信息 - CAsT-19: A Dataset for Conversational Information Seeking

CAsT-19: A Dataset for Conversational Information Seeking

CAsT-19 is a new dataset that supports research on conversational information seeking. The corpus is 38,426,252 passages from the TREC Complex Answer Retrieval (CAR) and Microsoft MAchine Reading COmprehension (MARCO) datasets. Eighty information seeking dialogues (30 train, 50 test) are an average of 9 to 10 questions long. A dialogue may explore a topic broadly or drill down into subtopics. Questions contain ellipsis, implied context, mild topic shifts, and other characteristics of human conversation that may prevent them from being understood in isolation. Relevance assessments are provided for 30 training topics and 20 test topics. CAsT-19 promotes research on conversational information seeking by defining it as a task in which effective passage selection requires understanding a question's context (the dialogue history). It focuses attention on user modeling, analysis of prior retrieval results, transformation of questions into effective queries, and other topics that have been difficult to study with existing datasets.

[1] Ming-Wei Chang,et al. Natural Questions: A Benchmark for Question Answering Research , 2019, TACL.

[2] Ben Carterette,et al. Overview of the TREC 2014 Session Track , 2014, TREC.

[3] Emine Yilmaz,et al. Research Frontiers in Information Retrieval Report from the Third Strategic Workshop on Information Retrieval in Lorne (SWIRL 2018) , 2018 .

[4] Charles L. A. Clarke,et al. Exploring Conversational Search With Humans, Assistants, and Wizards , 2017, CHI Extended Abstracts.

[5] Jun Huang,et al. Response Ranking with Deep Matching Networks and External Knowledge in Information-seeking Conversation Systems , 2018, SIGIR.

[6] Paul Solomon,et al. Conversation in information-seeking contexts: A test of an analytical framework , 1997 .

[7] Chenyan Xiong,et al. TREC CAsT 2019: The Conversational Assistance Track Overview , 2020, arXiv.org.

[8] Daniel McDuff,et al. MISC: A data set of information-seeking conversations , 2017 .

[9] Nicholas J. Belkin,et al. Cases, scripts, and information-seeking strategies: On the design of interactive information retrieval systems , 1995 .

[10] Filip Radlinski,et al. TREC Complex Answer Retrieval Overview , 2018, TREC.