论文信息 - Modelling the Combination of Generic and Target Domain Embeddings in a Convolutional Neural Network for Sentence Classification

Modelling the Combination of Generic and Target Domain Embeddings in a Convolutional Neural Network for Sentence Classification

Word embeddings have been successfully exploited in systems for NLP tasks, such as parsing and text classification. It is intuitive that word embeddings created from a larger corpus would provide a better coverage of vocabulary. Meanwhile, word embeddings trained on a corpus related to the given task or target domain would more effectively represent the semantics of terms. However, in some emerging domains (e.g. bio-surveillance using social media data), it may be difficult to find a domain corpus that is large enough for creating effective word embeddings. To deal with this problem, we propose novel approaches that use both word embeddings created from generic and target domain corpora. Our experimental results on sentence classification tasks show that our approaches significantly improve the performance of an existing convolutional neural network that achieved state-of-the-art performances on several text classification tasks.

Nigel Collier | Nut Limsopatham

[1] Mark Dredze,et al. Improving Lexical Embeddings with Semantic Knowledge , 2014, ACL.

[2] Jeffrey Pennington,et al. GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[3] Matthew D. Zeiler. ADADELTA: An Adaptive Learning Rate Method , 2012, ArXiv.

[4] Nitish Srivastava,et al. Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[5] Bo Pang,et al. Seeing Stars: Exploiting Class Relationships for Sentiment Categorization with Respect to Rating Scales , 2005, ACL.

[6] Azadeh Nikfarjam,et al. Mining Twitter for Adverse Drug Reaction Mentions : A Corpus and Classification Benchmark , 2014 .

[7] Nigel Collier,et al. Towards the Semantic Interpretation of Personal Health Messages from Social Media , 2015, UCUI@CIKM.

[8] George A. Miller,et al. WordNet: A Lexical Database for English , 1995, HLT.

[9] Paloma Martínez,et al. Exploring Word Embedding for Drug Name Recognition , 2015, Louhi@EMNLP.

[10] Nigel Collier,et al. Normalising Medical Concepts in Social Media Texts by Learning Semantic Representation , 2016, ACL.

[11] Bing Liu,et al. Mining and summarizing customer reviews , 2004, KDD.