Have You Lost the Thread? Discovering Ongoing Conversations in Scattered Dialog Blocks

Finding threads in textual dialogs is emerging as a need to better organize stored knowledge. We capture this need by introducing the novel task of discovering ongoing conversations in scattered dialog blocks. Our aim in this article is twofold. First, we propose a publicly available testbed for the task by solving the insurmountable problem of privacy of Big Personal Data. In fact, we showed that personal dialogs can be surrogated with theatrical plays. Second, we propose a suite of computationally light learning models that can use syntactic and semantic features. With this suite, we showed that models for this challenging task should include features capturing shifts in language use and, possibly, modeling underlying scripts.

[1]  Naomi S. Baron Who Sets E-Mail Style? Prescriptivism, Coping Strategies, and Democratizing Communication Access , 2002, Inf. Soc..

[2]  Jeffrey Pennington,et al.  Dynamic Pooling and Unfolding Recursive Autoencoders for Paraphrase Detection , 2011, NIPS.

[3]  F ROSENBLATT,et al.  The perceptron: a probabilistic model for information storage and organization in the brain. , 1958, Psychological review.

[4]  Mirella Lapata,et al.  Plot Induction and Evolutionary Search for Story Generation , 2010, ACL.

[5]  Dan Roth,et al.  An Inference Model for Semantic Entailment in Natural Language , 2005, IJCAI.

[6]  Jean Aitchison,et al.  Language and the Internet , 2002, Lit. Linguistic Comput..

[7]  Rui Yan,et al.  Recognizing Entailment and Contradiction by Tree-based Convolution , 2015, ArXiv.

[8]  Roberto Basili,et al.  Linear Online Learning over Structured Data with Distributed Tree Kernels , 2013, 2013 12th International Conference on Machine Learning and Applications.

[9]  Koby Crammer,et al.  Online Passive-Aggressive Algorithms , 2003, J. Mach. Learn. Res..

[10]  Lorenzo Dell'Arciprete,et al.  Efficient kernels for sentence pair classification , 2009, EMNLP.

[11]  Micha Elsner,et al.  You Talking to Me? A Corpus and Algorithm for Conversation Disentanglement , 2008, ACL.

[12]  Alessandro Moschitti,et al.  Efficient Graph Kernels for Textual Entailment Recognition , 2011, Fundam. Informaticae.

[13]  Jason Weston,et al.  Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..

[14]  David Haussler,et al.  Convolution kernels on discrete structures , 1999 .

[15]  Patrick Pantel,et al.  From Frequency to Meaning: Vector Space Models of Semantics , 2010, J. Artif. Intell. Res..

[16]  Geoffrey E. Hinton,et al.  Distributed representations and nested compositional structure , 1994 .

[17]  Nathanael Chambers,et al.  Unsupervised Learning of Narrative Schemas and their Participants , 2009, ACL.

[18]  Dhrubo Jyoti Sen,et al.  Whatsapp, Skype, Wickr, Viber, Twitter and Blog are Ready to Asymptote Globally from All Corners during Communications in Latest Fast Life , 2014 .

[19]  Andrew Y. Ng,et al.  Semantic Compositionality through Recursive Matrix-Vector Spaces , 2012, EMNLP.

[20]  W. Bruce Croft,et al.  Online community search using thread structure , 2009, CIKM.

[21]  Nathanael Chambers,et al.  Unsupervised Learning of Narrative Event Chains , 2008, ACL.

[22]  Mirella Lapata,et al.  Long Short-Term Memory-Networks for Machine Reading , 2016, EMNLP.

[23]  Alessandro Lenci,et al.  Distributional Memory: A General Framework for Corpus-Based Semantics , 2010, CL.

[24]  Yoshua. Bengio,et al.  Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..

[25]  Arul Menezes,et al.  Effectively Using Syntax for Recognizing False Entailment , 2006, NAACL.

[26]  Dan Klein,et al.  Accurate Unlexicalized Parsing , 2003, ACL.

[27]  Geoffrey E. Hinton,et al.  Distributed Representations , 1986, The Philosophy of Artificial Intelligence.

[28]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[29]  Günter Neumann,et al.  Recognizing Textual Entailment Using Sentence Similarity based on Dependency Tree Skeletons , 2007, ACL-PASCAL@ACL.

[30]  Michael Collins,et al.  Convolution Kernels for Natural Language , 2001, NIPS.

[31]  Alexander S. Yeh,et al.  More accurate tests for the statistical significance of result differences , 2000, COLING.

[32]  Rada Mihalcea,et al.  Measuring the Semantic Similarity of Texts , 2005, EMSEE@ACL.

[33]  Li Wang,et al.  Predicting Thread Discourse Structure over Technical Web Forums , 2011, EMNLP.

[34]  Fabio Massimo Zanzotto,et al.  Distributed Tree Kernels , 2012, ICML.

[35]  Mehrnoosh Sadrzadeh,et al.  Experimental Support for a Categorical Compositional Distributional Model of Meaning , 2011, EMNLP.

[36]  Alessandro Moschitti,et al.  Automatic Learning of Textual Entailments with Cross-Pair Similarities , 2006, ACL.

[37]  Silvia Bernardini,et al.  Introducing and evaluating ukWaC , a very large web-derived corpus of English , 2008 .

[38]  Roger C. Schank,et al.  Scripts, plans, goals and understanding: an inquiry into human knowledge structures , 1978 .

[39]  Malik Magdon-Ismail,et al.  Detecting conversing groups of chatters: a model, algorithms, and tests , 2005, IADIS AC.

[40]  Bernardo Magnini,et al.  Tree edit distance for textual entailment , 2007 .

[41]  Iryna Gurevych,et al.  Adjacency Pair Recognition in Wikipedia Discussions using Lexical Pairs , 2014, PACLIC.

[42]  Gennaro Chierchia,et al.  Meaning and Grammar: An Introduction to Semantics , 1990 .

[43]  Marvin Minsky,et al.  A framework for representing knowledge , 1974 .

[44]  Phil Blunsom,et al.  Reasoning about Entailment with Neural Attention , 2015, ICLR.

[45]  Ido Dagan,et al.  Recognizing Textual Entailment: Models and Applications , 2013, Recognizing Textual Entailment: Models and Applications.

[46]  Phil Blunsom,et al.  Recurrent Convolutional Neural Networks for Discourse Compositionality , 2013, CVSM@ACL.

[47]  S. Clark,et al.  A Compositional Distributional Model of Meaning , 2008 .

[48]  Christopher Potts,et al.  A large annotated corpus for learning natural language inference , 2015, EMNLP.

[49]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[50]  Michael Collins,et al.  New Ranking Algorithms for Parsing and Tagging: Kernels over Discrete Structures, and the Voted Perceptron , 2002, ACL.

[51]  Mirella Lapata,et al.  Learning to Tell Tales: A Data-driven Approach to Story Generation , 2009, ACL.

[52]  Sanja Fidler,et al.  Order-Embeddings of Images and Language , 2015, ICLR.

[53]  Mirella Lapata,et al.  Vector-based Models of Semantic Composition , 2008, ACL.

[54]  James L. McClelland,et al.  Parallel distributed processing: explorations in the microstructure of cognition, vol. 1: foundations , 1986 .

[55]  Roberto Basili,et al.  KeLP: a Kernel-based Learning Platform for Natural Language Processing , 2015, ACL.

[56]  Alan Ritter,et al.  Unsupervised Modeling of Twitter Conversations , 2010, NAACL.

[57]  Marco Baroni,et al.  Nouns are Vectors, Adjectives are Matrices: Representing Adjective-Noun Constructions in Semantic Space , 2010, EMNLP.

[58]  Ioannis Korkontzelos,et al.  Estimating Linear Models for Compositional Distributional Semantics , 2010, COLING.