In Defense of Word Embedding for Generic Text Representation

Statistical methods have shown a remarkable ability to capture semantics. The word2vec method is a frequently cited method for capturing meaningful semantic relations between words from a large text corpus. It has the advantage of not requiring any tagging while training. The prevailing view is, however, that it lacks the ability to capture semantics of word sequences and is virtually useless for most purposes, unless combined with heavy machinery. This paper challenges that view, by showing that by augmenting the word2vec representation with one of a few pooling techniques, results are obtained surpassing or comparable with the best literature algorithms. This improved performance is justified by theory and verified by extensive experiments on well studied NLP benchmarks (This work is inspired by [10]).

[1]  Rama Chellappa,et al.  Growing Regression Forests by Classification: Applications to Object Pose Estimation , 2013, ECCV.

[2]  Andrew Zisserman,et al.  The devil is in the details: an evaluation of recent feature encoding methods , 2011, BMVC.

[3]  Florent Perronnin,et al.  Fisher Kernels on Visual Vocabularies for Image Categorization , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  Christopher D. Manning,et al.  Probabilistic Tree-Edit Models with Structured Latent Variables for Textual Entailment and Question Answering , 2010, COLING.

[5]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[6]  Jason Weston,et al.  Question Answering with Subgraph Embeddings , 2014, EMNLP.

[7]  Noah A. Smith,et al.  Tree Edit Models for Recognizing Textual Entailments, Paraphrases, and Answers to Questions , 2010, NAACL.

[8]  Yoshua Bengio,et al.  Word Representations: A Simple and General Method for Semi-Supervised Learning , 2010, ACL.

[9]  Iryna Gurevych,et al.  DKPro Similarity: An Open Source Framework for Text Similarity , 2013, ACL.

[10]  Lei Yu,et al.  Deep Learning for Answer Sentence Selection , 2014, ArXiv.

[11]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[12]  Andrew Y. Ng,et al.  Parsing Natural Scenes and Natural Language with Recursive Neural Networks , 2011, ICML.

[13]  W. Bruce Croft,et al.  Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , 2013 .

[14]  Te-Won Lee Independent component analysis: theory and applications [Book Review] , 1999, IEEE Transactions on Neural Networks.

[15]  Andrew Zisserman,et al.  Fisher Vector Faces in the Wild , 2013, BMVC.

[16]  Chris Callison-Burch,et al.  Answer Extraction as Sequence Tagging with Tree Edit Distance , 2013, NAACL.

[17]  Florent Perronnin,et al.  Large-scale image retrieval with compressed Fisher vectors , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[18]  Te-Won Lee,et al.  Independent Component Analysis , 1998, Springer US.

[19]  Yan Ke,et al.  PCA-SIFT: a more distinctive representation for local image descriptors , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[20]  Roberto Navigli,et al.  Align, Disambiguate and Walk: A Unified Approach for Measuring Semantic Similarity , 2013, ACL.

[21]  Alexei A. Efros,et al.  Discovering objects and their location in images , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[22]  Thomas Mensink,et al.  Improving the Fisher Kernel for Large-Scale Image Classification , 2010, ECCV.

[23]  Christopher Potts,et al.  Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank , 2013, EMNLP.

[24]  Alessandro Moschitti,et al.  Automatic Feature Engineering for Answer Selection and Extraction , 2013, EMNLP.

[25]  Geoffrey E. Hinton,et al.  Three new graphical models for statistical language modelling , 2007, ICML '07.

[26]  Thomas Mensink,et al.  Image Classification with the Fisher Vector: Theory and Practice , 2013, International Journal of Computer Vision.

[27]  Jan Snajder,et al.  TakeLab: Systems for Measuring Semantic Text Similarity , 2012, *SEMEVAL.

[28]  Jason Weston,et al.  A unified architecture for natural language processing: deep neural networks with multitask learning , 2008, ICML '08.

[29]  Miguel Rios UoW: Multi-task Learning Gaussian Process for Semantic Textual Similarity , 2014, SemEval@COLING.

[30]  Ming-Wei Chang,et al.  Question Answering Using Enhanced Lexical Semantic Models , 2013, ACL.

[31]  G. Agrò Maximum likelihood estimation for the exponential power function parameters , 1995 .

[32]  Xiang Zhang,et al.  Text Understanding from Scratch , 2015, ArXiv.

[33]  Peter Young,et al.  From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions , 2014, TACL.

[34]  Richard I. Hartley,et al.  In Defense of the Eight-Point Algorithm , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[35]  Yoshua Bengio,et al.  A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[36]  David H. Wolpert,et al.  Stacked generalization , 1992, Neural Networks.

[37]  Eneko Agirre,et al.  SemEval-2012 Task 6: A Pilot on Semantic Textual Similarity , 2012, *SEMEVAL.

[38]  Omer Levy,et al.  Neural Word Embedding as Implicit Matrix Factorization , 2014, NIPS.

[39]  Quoc V. Le,et al.  Distributed Representations of Sentences and Documents , 2014, ICML.

[40]  Cordelia Schmid,et al.  Aggregating local descriptors into a compact image representation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[41]  Iryna Gurevych,et al.  UKP: Computing Semantic Textual Similarity by Combining Multiple Content Similarity Measures , 2012, *SEMEVAL.

[42]  I HartleyRichard In Defense of the Eight-Point Algorithm , 1997 .

[43]  Noah A. Smith,et al.  What is the Jeopardy Model? A Quasi-Synchronous Grammar for QA , 2007, EMNLP.

[44]  Yu Qiao,et al.  Action Recognition with Stacked Fisher Vectors , 2014, ECCV.