Regularized and Retrofitted models for Learning Sentence Representation with Context

Vector representation of sentences is important for many text processing tasks that involve classifying, clustering, or ranking sentences. For solving these tasks, bag-of-word based representation has been used for a long time. In recent years, distributed representation of sentences learned by neural models from unlabeled data has been shown to outperform traditional bag-of-words representations. However, most existing methods belonging to the neural models consider only the content of a sentence, and disregard its relations with other sentences in the context. In this paper, we first characterize two types of contexts depending on their scope and utility. We then propose two approaches to incorporate contextual information into content-based models. We evaluate our sentence representation models in a setup, where context is available to infer sentence vectors. Experimental results demonstrate that our proposed models outshine existing models on three fundamental tasks, such as, classifying, clustering, and ranking sentences.

[1]  Micha Elsner,et al.  Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers) , 2014 .

[2]  Manfred Stede,et al.  Discourse Processing , 2011, NAACL.

[3]  Christopher Potts,et al.  Learning Word Vectors for Sentiment Analysis , 2011, ACL.

[4]  Andrew Y. Ng,et al.  Parsing with Compositional Vector Grammars , 2013, ACL.

[5]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[6]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[7]  Phil Blunsom,et al.  Multilingual Models for Compositional Distributed Semantics , 2014, ACL.

[8]  S. Dongen A cluster algorithm for graphs , 2000 .

[9]  Thorsten Joachims,et al.  Training linear SVMs in linear time , 2006, KDD '06.

[10]  Zhiyuan Liu,et al.  Improving Word Representations with Document Labels , 2017, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[11]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[12]  Chin-Yew Lin,et al.  ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.

[13]  Quoc V. Le,et al.  Distributed Representations of Sentences and Documents , 2014, ICML.

[14]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[15]  Igor Malioutov,et al.  Minimum Cut Model for Spoken Lecture Segmentation , 2006, ACL.

[16]  Felix Hill,et al.  Learning Distributed Representations of Sentences from Unlabelled Data , 2016, NAACL.

[17]  John B. Lowe,et al.  The Berkeley FrameNet Project , 1998, ACL.

[18]  Edward Gibson,et al.  Representing Discourse Coherence: A Corpus-Based Study , 2005, CL.

[19]  Julia Hirschberg,et al.  V-Measure: A Conditional Entropy-Based External Cluster Evaluation Measure , 2007, EMNLP.

[20]  Dragomir R. Radev,et al.  LexRank: Graph-based Lexical Centrality as Salience in Text Summarization , 2004, J. Artif. Intell. Res..

[21]  Koby Crammer,et al.  New Regularized Algorithms for Transductive Learning , 2009, ECML/PKDD.

[22]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[23]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[24]  Sanja Fidler,et al.  Skip-Thought Vectors , 2015, NIPS.

[25]  Christopher Potts,et al.  Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank , 2013, EMNLP.

[26]  M. de Rijke,et al.  Siamese CBOW: Optimizing Word Embeddings for Sentence Representations , 2016, ACL.

[27]  Andrew Y. Ng,et al.  Parsing Natural Scenes and Natural Language with Recursive Neural Networks , 2011, ICML.

[28]  Angeliki Lazaridou,et al.  Jointly optimizing word representations for lexical and sentential tasks with the C-PHRASE model , 2015, ACL.

[29]  Gang Wang,et al.  RC-NET: A General Framework for Incorporating Knowledge into Word Representations , 2014, CIKM.

[30]  Sergei Vassilvitskii,et al.  k-means++: the advantages of careful seeding , 2007, SODA '07.

[31]  Eric P. Xing,et al.  Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , 2014, ACL 2014.

[32]  Mark Dredze,et al.  Improving Lexical Embeddings with Semantic Knowledge , 2014, ACL.

[33]  Jerry R. Hobbs Coherence and Coreference , 1979, Cogn. Sci..

[34]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[35]  Yoshua Bengio,et al.  A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[36]  Sanja Fidler,et al.  Aligning Books and Movies: Towards Story-Like Visual Explanations by Watching Movies and Reading Books , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[37]  Mark T. Maybury,et al.  Automatic Summarization , 2002, Computational Linguistics.