Overview of RCD-2020, the FIRE-2020 track on Retrieval from Conversational Dialogues

This paper describes an overview of the track - ’Retrieval from Conversational Dialogues’ (RCD) organized as a part of Forum of Information Retrieval and Evaluation (FIRE), 2020. The motivation of the track is to develop a dataset towards a controlled and reproducible laboratory based experimental setup for investigating the effectiveness if conversational assistance systems. Specifically, the manner of conversational assistance which this track addresses is contextualization of certain concepts within the content either written (e.g. a chat system) or uttered (e.g. in an audio or video communication) by a user about which the other users participating in the communication are not well versed. To study the problem under a laboratory-based reproducible setting, we took a collection of four movie scripts and manually annotated spans of text that may require contextualization. The two tasks involved in RCD track are: a) Task-1:, where participants were required to estimate the annotated span of text likely to be benefited by contextualization from a given sequence of dialogue based interactions from the script; and b) Task-2:, which involved retrieving a ranked list of documents corresponding to the concepts requiring contextualization. To evaluate the effectiveness of Task-1, we used i) a character n-gram based variant of the BLEU score, and ii) bag-of-words based Jaccard coefficient to measure the overlap between the manually annotated ground-truth and the automatically extracted text spans at two different granularity levels of character and word matches, respectively. To evaluate the effectiveness of the retrieved documents for Task-2, we employed two standard precision-oriented information retrieval (IR) metrics, namely precision at top-5 ranks (P@5) and mean reciprocal rank (MRR), along with a both precision and recall oriented metric, namely the mean average precision (MAP). We received a total of 5 submissions from a single participating team for both the tasks. A general trend from the submitted runs is that statistical-based unsupervised approaches of term extraction and summarization from movie scripts turned out to be more effective for both the tasks (i.e. query identification and retrieval) than supervised approaches, such as pre-trained transformer (BERT) based ones.