Zero-Shot Dialogue Disentanglement by Self-Supervised Entangled Response Selection

Dialogue disentanglement aims to group utterances in a long and multi-participant dialogue into threads. This is useful for discourse analysis and downstream applications such as dialogue response selection, where it can be the first step to construct a clean context/response set. Unfortunately, labeling all reply-to links takes quadratic effort w.r.t the number of utterances: an annotator must check all preceding utterances to identify the one to which the current utterance is a reply. In this paper, we are the first to propose a zero-shot dialogue disentanglement solution. Firstly, we train a model on a multi-participant response selection dataset harvested from the web which is not annotated; we then apply the trained model to perform zero-shot dialogue disentanglement. Without any labeled data, our model can achieve a cluster F1 score of 25. We also fine-tune the model using various amounts of labeled data. Experiments show that with only 10% of the data, we achieve nearly the same performance of using the full dataset1.

[1]  Qiang Yang,et al.  Thread detection in dynamic text message streams , 2006, SIGIR.

[2]  Arman Cohan,et al.  Longformer: The Long-Document Transformer , 2020, ArXiv.

[3]  Ramesh Nallapati,et al.  Who did They Respond to? Conversation Structure Modeling using Masked Hierarchical Transformer , 2019, AAAI.

[4]  Douglas W. Oard,et al.  Context-based Message Expansion for Disentanglement of Interleaved Text Conversations , 2009, NAACL.

[5]  Shafiq R. Joty,et al.  Response Selection for Multi-Party Conversations with Dynamic Topic Tracking , 2020, EMNLP.

[6]  Tao Yu,et al.  Online Conversation Disentanglement with Pointer Networks , 2020, EMNLP.

[7]  Micha Elsner,et al.  You Talking to Me? A Corpus and Algorithm for Conversation Disentanglement , 2008, ACL.

[8]  Jatin Ganhotra,et al.  A Large-Scale Corpus for Conversation Disentanglement , 2018, ACL.

[9]  Wei Wang,et al.  Learning to Disentangle Interleaved Conversational Threads with a Siamese Hierarchical Network and Similarity Ranking , 2018, NAACL.

[10]  Micha Elsner,et al.  Disentangling Chat with Local Coherence Models , 2011, ACL.

[11]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[12]  Walter S. Lasecki,et al.  NOESIS II: Predicting Responses, Identifying Success, and Managing Complexity in Task-Oriented Dialogue , 2019 .

[13]  Li Yang,et al.  Big Bird: Transformers for Longer Sequences , 2020, NeurIPS.

[14]  Yiming Yang,et al.  XLNet: Generalized Autoregressive Pretraining for Language Understanding , 2019, NeurIPS.

[15]  Quan Liu,et al.  Pre-Trained and Attention-Based Neural Networks for Building Noetic Task-Oriented Dialogue Systems , 2020, ArXiv.

[16]  Omer Levy,et al.  RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.

[17]  Zhenhua Ling,et al.  DialBERT: A Hierarchical Pre-Trained Model for Conversation Disentanglement , 2020, arXiv.org.