Information Bottleneck Inspired Method For Chat Text Segmentation

We present a novel technique for segmenting chat conversations using the information bottleneck method (Tishby et al., 2000), augmented with sequential continuity constraints. Furthermore, we utilize critical non-textual clues such as time between two consecutive posts and people mentions within the posts. To ascertain the effectiveness of the proposed method, we have collected data from public Slack conversations and Fresco, a proprietary platform deployed inside our organization. Experiments demonstrate that the proposed method yields an absolute (relative) improvement of as high as 3.23% (11.25%). To facilitate future research, we are releasing manual annotations for segmentation on public Slack conversations.

[1]  James Allan,et al.  Topic detection and tracking: event-based information organization , 2002 .

[2]  Athanasios Kehagias,et al.  Linear Text Segmentation using a Dynamic Programming Algorithm , 2003, EACL.

[3]  Hitoshi Isahara,et al.  A Statistical Model for Domain-Independent Text Segmentation , 2001, ACL.

[4]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[5]  Thorsten Brants,et al.  Topic-based document segmentation with probabilistic latent semantic analysis , 2002, CIKM '02.

[6]  Shafiq R. Joty,et al.  Topic Segmentation and Labeling in Asynchronous Conversations , 2013, J. Artif. Intell. Res..

[7]  Micha Elsner,et al.  You Talking to Me? A Corpus and Algorithm for Conversation Disentanglement , 2008, ACL.

[8]  Marti A. Hearst,et al.  A Critique and Improvement of an Evaluation Metric for Text Segmentation , 2002, CL.

[9]  Carolyn Penstein Rosé,et al.  Recovering Implicit Thread Structure in Newsgroup Style Conversations , 2021, ICWSM.

[10]  Costin-Gabriel Chiru,et al.  Automatic Assessment of Collaborative Chat Conversations with PolyCAFe , 2011, EC-TEL.

[11]  John D. Lafferty,et al.  Statistical Models for Text Segmentation , 1999, Machine Learning.

[12]  John Yen,et al.  Multi-task text segmentation and alignment based on weighted mutual information , 2006, CIKM '06.

[13]  Alexander A. Alemi,et al.  Text Segmentation based on Semantic Word Embeddings , 2015, ArXiv.

[14]  Micha Elsner,et al.  Disentangling Chat with Local Coherence Models , 2011, ACL.

[15]  Goran Glavas,et al.  Unsupervised Text Segmentation Using Semantic Relatedness Graphs , 2016, *SEMEVAL.

[16]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[17]  Weiming Hu,et al.  Topic Detection and Tracking for Threaded Discussion Communities , 2008, 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology.

[18]  Marti A. Hearst Text Tiling: Segmenting Text into Multi-paragraph Subtopic Passages , 1997, CL.

[19]  Alan P. Schmidt Detection of Topic Change in IRC Chat Logs , 2003 .

[20]  Lan Du,et al.  Topic Segmentation with an Ordering-Based Topic Model , 2015, AAAI.

[21]  Chris Biemann,et al.  Text Segmentation with Topic Models , 2012, Journal for Language Technology and Computational Linguistics.

[22]  Naftali Tishby,et al.  Agglomerative Information Bottleneck , 1999, NIPS.

[23]  Joemon M. Jose,et al.  Text segmentation via topic modeling: an analytical study , 2009, CIKM.

[24]  Freddy Y. Y. Choi Advances in domain independent linear text segmentation , 2000, ANLP.

[25]  Diana Inkpen,et al.  Getting More from Segmentation Evaluation , 2012, HLT-NAACL.

[26]  Micha Elsner,et al.  Disentangling Chat , 2010, CL.

[27]  Bowen Zhou,et al.  Pointing the Unknown Words , 2016, ACL.

[28]  Naftali Tishby,et al.  The information bottleneck method , 2000, ArXiv.

[29]  José Gabriel Pereira Lopes,et al.  Topic Segmentation Algorithms for Text Summarization and Passage Retrieval: An Exhaustive Evaluation , 2007, AAAI.

[30]  Shay B. Cohen,et al.  Conversation Trees: A Grammar Model for Topic Structure in Forums , 2015, EMNLP.

[31]  Shimei Pan,et al.  TIARA: Interactive, Topic-Based Visual Text Summarization and Analysis , 2012, TIST.

[32]  Mohsen Pourvali,et al.  A new graph based text segmentation using Wikipedia for automatic text summarization , 2012 .

[33]  Regina Barzilay,et al.  Catching the Drift: Probabilistic Content Models, with Applications to Generation and Summarization , 2004, NAACL.

[34]  Shiri Gordon,et al.  Applying the information bottleneck principle to unsupervised clustering of discrete and continuous image representations , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[35]  Lan Du,et al.  Topic Segmentation with a Structured Topic Model , 2013, NAACL.

[36]  Naftali Tishby,et al.  Document clustering using word clusters via the information bottleneck method , 2000, SIGIR '00.

[37]  Ryotaro Kamimura,et al.  Information-theoretic enhancement learning and its application to visualization of self-organizing maps , 2010, Neurocomputing.

[38]  Hideki Kozima,et al.  Text Segmentation Based on Similarity between Words , 1993, ACL.

[39]  Mateu Sbert,et al.  Image Segmentation Using Information Bottleneck Method , 2009, IEEE Transactions on Image Processing.

[40]  Fabio Valente,et al.  An Information Theoretic Approach to Speaker Diarization of Meeting Data , 2009, IEEE Transactions on Audio, Speech, and Language Processing.