Dialogue Disentanglement in Software Engineering: How Far are We?

Despite the valuable information contained in software chat messages, disentangling them into distinct conversations is an essential prerequisite for any in-depth analyses that utilize this information. To provide a better understanding of the current state-of-the-art, we evaluate five popular dialog disentanglement approaches on software-related chat. We find that existing approaches do not perform well on disentangling software-related dialogs that discuss technical and complex topics. Further investigation on how well the existing disentanglement measures reflect human satisfaction shows that existing measures cannot correctly indicate human satisfaction on disentanglement results. Therefore, in this paper, we introduce and evaluate a novel measure, named DLD. Using results of human satisfaction, we further summarize four most frequently appeared bad disentanglement cases on software-related chat to insight future improvements. These cases include (i) Ignoring Interaction Patterns, (ii) Ignoring Contextual Information, (iii) Mixing up Topics, and (iv) Ignoring User Relationships. We believe that our findings provide valuable insights on the effectiveness of existing dialog disentanglement approaches and these findings would promote a better application of dialog disentanglement in software engineering.

[1]  Pieter Reitsma,et al.  Educational and Psychological Measurement , 2003 .

[2]  Tao Yu,et al.  Online Conversation Disentanglement with Pointer Networks , 2020, EMNLP.

[3]  Jaiteg Singh,et al.  Enhancing Levenshtein’s Edit Distance Algorithm for Evaluating Document Similarity , 2017 .

[4]  W. Bruce Croft,et al.  User Intent Prediction in Information-seeking Conversations , 2019, CHIIR.

[5]  Jacob Cohen A Coefficient of Agreement for Nominal Scales , 1960 .

[6]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[7]  Qiang Yang,et al.  Thread detection in dynamic text message streams , 2006, SIGIR.

[8]  Quan Liu,et al.  End-to-End Transition-Based Online Dialogue Disentanglement , 2020, IJCAI.

[9]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[10]  Rex Hartson,et al.  Empirical UX Evaluation: Data Collection Methods and Techniques , 2019, The UX Book.

[11]  Mark J. Embrechts,et al.  On the Use of the Adjusted Rand Index as a Metric for Evaluating Supervised Classification , 2009, ICANN.

[12]  Zhenhua Ling,et al.  DialBERT: A Hierarchical Pre-Trained Model for Conversation Disentanglement , 2020, arXiv.org.

[13]  Joydeep Ghosh,et al.  Cluster Ensembles --- A Knowledge Reuse Framework for Combining Multiple Partitions , 2002, J. Mach. Learn. Res..

[14]  Wei Wang,et al.  Learning to Disentangle Interleaved Conversational Threads with a Siamese Hierarchical Network and Similarity Ranking , 2018, NAACL.

[15]  Gordon Rugg,et al.  The sorting techniques: a tutorial paper on card sorts, picture sorts and item sorts , 1997, Expert Syst. J. Knowl. Eng..

[16]  Fabio Crestani,et al.  Logic and Uncertainty in Information Retrieval , 2001, ESSIR.

[17]  Peter Christen,et al.  A Comparison of Personal Name Matching: Techniques and Practical Issues , 2006, Sixth IEEE International Conference on Data Mining - Workshops (ICDMW'06).

[18]  Joelle Pineau,et al.  The Ubuntu Dialogue Corpus: A Large Dataset for Research in Unstructured Multi-Turn Dialogue Systems , 2015, SIGDIAL Conference.

[19]  Micha Elsner,et al.  You Talking to Me? A Corpus and Algorithm for Conversation Disentanglement , 2008, ACL.

[20]  Massimiliano Di Penta,et al.  What kind of questions do developers ask on Stack Overflow? A comparison of automated approaches to classify posts into question categories , 2019, Empirical Software Engineering.

[21]  Jatin Ganhotra,et al.  A Large-Scale Corpus for Conversation Disentanglement , 2018, ACL.

[22]  Xiao Wei,et al.  How to Interact and Change? Abstractive Dialogue Summarization with Dialogue Act Weight and Topic Change Info , 2020, KSEM.

[23]  Giuseppe Carenini,et al.  Chat Disentanglement: Identifying Semantic Reply Relationships with Random Forests and Recurrent Neural Networks , 2017, IJCNLP.