Linguistic Redundancy in Twitter

In the last few years, the interest of the research community in micro-blogs and social media services, such as Twitter, is growing exponentially. Yet, so far not much attention has been paid on a key characteristic of micro-blogs: the high level of information redundancy. The aim of this paper is to systematically approach this problem by providing an operational definition of redundancy. We cast redundancy in the framework of Textual Entailment Recognition. We also provide quantitative evidence on the pervasiveness of redundancy in Twitter, and describe a dataset of redundancy-annotated tweets. Finally, we present a general purpose system for identifying redundant tweets. An extensive quantitative evaluation shows that our system successfully solves the redundancy detection task, improving over baseline systems with statistical significance.

[1]  Rada Mihalcea,et al.  Measuring the Semantic Similarity of Texts , 2005, EMSEE@ACL.

[2]  Alessandro Moschitti,et al.  Automatic Learning of Textual Entailments with Cross-Pair Similarities , 2006, ACL.

[3]  Ted Pedersen,et al.  WordNet::Similarity - Measuring the Relatedness of Concepts , 2004, NAACL.

[4]  Rizal Setya Perdana What is Twitter , 2013 .

[5]  Ana-Maria Popescu,et al.  Detecting controversial events from twitter , 2010, CIKM.

[6]  Alan Ritter,et al.  Unsupervised Modeling of Twitter Conversations , 2010, NAACL.

[7]  Junlan Feng,et al.  Robust Sentiment Detection on Twitter from Biased and Noisy Data , 2010, COLING.

[8]  Ido Dagan,et al.  The Third PASCAL Recognizing Textual Entailment Challenge , 2007, ACL-PASCAL@ACL.

[9]  Prasenjit Mitra,et al.  Temporal and Information Flow Based Event Detection from Social Text Streams , 2007, AAAI.

[11]  Hosung Park,et al.  What is Twitter, a social network or a news media? , 2010, WWW '10.

[12]  Miles Osborne,et al.  Streaming First Story Detection with application to Twitter , 2010, NAACL.

[13]  Susan T. Dumais,et al.  Characterizing Microblogs with Topic Models , 2010, ICWSM.

[14]  Eamonn Newman,et al.  Textual Entailment Recognition Using a Linguistically-Motivated Decision Tree Classifier , 2005, MLCW.

[15]  Harry Shum,et al.  An Empirical Study on Learning to Rank of Tweets , 2010, COLING.

[16]  David W. Conrath,et al.  Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy , 1997, ROCLING/IJCLCLP.

[17]  Chris Quirk,et al.  Unsupervised Construction of Large Paraphrase Corpora: Exploiting Massively Parallel News Sources , 2004, COLING.

[18]  Christopher D. Manning,et al.  Learning to recognize features of valid textual entailments , 2006, NAACL.

[19]  Lucy Vanderwende,et al.  Exploring Content Models for Multi-Document Summarization , 2009, NAACL.

[20]  Bo Zhao,et al.  PET: a statistical model for popular events tracking in social communities , 2010, KDD.

[21]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[22]  Thorsten Joachims,et al.  Making large scale SVM learning practical , 1998 .

[23]  Lorenzo Dell'Arciprete,et al.  Efficient kernels for sentence pair classification , 2009, EMNLP.

[24]  Christopher D. Manning,et al.  Learning to distinguish valid textual entailments , 2006 .

[25]  Andrew Y. Ng,et al.  Robust Textual Inference via Graph Matching , 2005, HLT.

[26]  Alessandro Moschitti,et al.  Fast and effective kernels for relational learning from texts , 2007, ICML '07.

[27]  M. Pennacchiotti,et al.  A machine learning approach to textual entailment recognition , 2009, Natural Language Engineering.

[28]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[29]  Andrew Hickl,et al.  Recognizing Textual Entailment with LCC’s G ROUNDHOG System , 2005 .

[30]  Michael Collins,et al.  New Ranking Algorithms for Parsing and Tagging: Kernels over Discrete Structures, and the Voted Perceptron , 2002, ACL.

[31]  Timothy W. Finin,et al.  Why we twitter: understanding microblogging usage and communities , 2007, WebKDD/SNA-KDD '07.

[32]  Ari Rappoport,et al.  Enhanced Sentiment Learning Using Twitter Hashtags and Smileys , 2010, COLING.

[33]  Balachander Krishnamurthy,et al.  A few chirps about twitter , 2008, WOSN '08.

[34]  Eugene Charniak,et al.  A Maximum-Entropy-Inspired Parser , 2000, ANLP.

[35]  Changning Huang,et al.  Semantic Role Labeling for News Tweets , 2010, COLING.

[36]  Alexander S. Yeh,et al.  More accurate tests for the statistical significance of result differences , 2000, COLING.