AMRITA_CEN$@$SemEval-2015: Paraphrase Detection for Twitter using Unsupervised Feature Learning with Recursive Autoencoders

We explore using recursive autoencoders for SemEval 2015 Task 1: Paraphrase and Semantic Similarity in Twitter. Our paraphrase detection system makes use of phrase-structure parse tree embeddings that are then provided as input to a conventional supervised classification model. We achieve an F1 score of 0.45 on paraphrase identification and a Pearson correlation of 0.303 on computing semantic similarity.

[1]  Kalina Bontcheva,et al.  Twitter Part-of-Speech Tagging for All: Overcoming Sparse and Noisy Data , 2013, RANLP.

[2]  Dan Klein,et al.  Accurate Unlexicalized Parsing , 2003, ACL.

[3]  Ralph Grishman,et al.  Paraphrasing for Style , 2012, COLING.

[4]  Brendan T. O'Connor,et al.  TweetMotif: Exploratory Search and Topic Summarization for Twitter , 2010, ICWSM.

[5]  Emiel Krahmer,et al.  Explorations in Sentence Fusion , 2005, ENLG.

[6]  Yorick Wilks,et al.  Measuring Text Reuse , 2002, ACL.

[7]  Wei Xu,et al.  A Preliminary Study of Tweet Summarization using Information Extraction , 2013 .

[8]  Zia Ul-Qayyum,et al.  Paraphrase Identification using Semantic Heuristic Features , 2012 .

[9]  Jason Weston,et al.  A unified architecture for natural language processing: deep neural networks with multitask learning , 2008, ICML '08.

[10]  Fabio Massimo Zanzotto,et al.  Linguistic Redundancy in Twitter , 2011, EMNLP.

[11]  Jeffrey Pennington,et al.  Dynamic Pooling and Unfolding Recursive Autoencoders for Paraphrase Detection , 2011, NIPS.

[12]  Chris Callison-Burch,et al.  Extracting Lexically Divergent Paraphrases from Twitter , 2014, TACL.

[13]  Chris Callison-Burch,et al.  Syntactic Constraints on Paraphrases Extracted from Parallel Corpora , 2008, EMNLP.

[14]  Wei Wu,et al.  Paraphrase detection on SMS messages in automobiles , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[15]  Wang Ling,et al.  Paraphrasing 4 Microblog Normalization , 2013, EMNLP.

[16]  Wei Xu,et al.  Gathering and Generating Paraphrases from Twitter with Application to Normalization , 2013, BUCC@ACL.

[17]  E. Huang,et al.  Paraphrase Detection Using Recursive Autoencoder , 2011 .

[18]  Ming Zhou,et al.  Coooolll: A Deep Learning System for Twitter Sentiment Classification , 2014, *SEMEVAL.

[19]  Miles Osborne,et al.  Using paraphrases for improving first story detection in news and Twitter , 2012, HLT-NAACL.

[20]  Nitin Madnani,et al.  Generating Phrasal and Sentential Paraphrases: A Survey of Data-Driven Methods , 2010, CL.

[21]  Samuel Fernando,et al.  A Semantic Similarity Approach to Paraphrase Detection , 2008 .

[22]  Chris Quirk,et al.  Unsupervised Construction of Large Paraphrase Corpora: Exploiting Massively Parallel News Sources , 2004, COLING.

[23]  Chris Callison-Burch,et al.  PPDB: The Paraphrase Database , 2013, NAACL.