Is Language Modeling Enough? Evaluating Effective Embedding Combinations

Universal embeddings, such as BERT or ELMo, are useful for a broad set of natural language processing tasks like text classification or sentiment analysis. Moreover, specialized embeddings also exist for tasks like topic modeling or named entity disambiguation. We study if we can complement these universal embeddings with specialized embeddings. We conduct an in-depth evaluation of nine well known natural language understanding tasks with SentEval. Also, we extend SentEval with two additional tasks to the medical domain. We present PubMedSection, a novel topic classification dataset focussed on the biomedical domain. Our comprehensive analysis covers 11 tasks and combinations of six embeddings. We report that combined embeddings outperform state of the art universal embeddings without any embedding fine-tuning. We observe that adding topic model based embeddings helps for most tasks and that differing pre-training tasks encode complementary features. Moreover, we present new state of the art results on the MPQA and SUBJ tasks in SentEval.

[1]  Yoshua Bengio,et al.  Deep Sparse Rectifier Neural Networks , 2011, AISTATS.

[2]  Houfeng Wang,et al.  Learning to Rank Semantic Coherence for Topic Segmentation , 2017, EMNLP.

[3]  Kyunghyun Cho,et al.  Dynamic Meta-Embeddings for Improved Sentence Representations , 2018, EMNLP.

[4]  E. Berner,et al.  Clinical Decision Support Systems: Theory and Practice , 1998 .

[5]  Sanja Fidler,et al.  Skip-Thought Vectors , 2015, NIPS.

[6]  Tomas Mikolov,et al.  Advances in Pre-Training Distributed Word Representations , 2017, LREC.

[7]  Philippe Cudré-Mauroux,et al.  Fusing Vector Space Models for Domain-Specific Applications , 2019, 2019 IEEE 31st International Conference on Tools with Artificial Intelligence (ICTAI).

[8]  Ulf Leser,et al.  How to improve information extraction from German medical records , 2017, it Inf. Technol..

[9]  David M. Blei,et al.  Probabilistic topic models , 2012, Commun. ACM.

[10]  Bo Pang,et al.  Seeing Stars: Exploiting Class Relationships for Sentiment Categorization with Respect to Rating Scales , 2005, ACL.

[11]  Ophir Frieder,et al.  Characterizing Question Facets for Complex Answer Retrieval , 2018, SIGIR.

[12]  Tomas Mikolov,et al.  Bag of Tricks for Efficient Text Classification , 2016, EACL.

[13]  Nan Hua,et al.  Universal Sentence Encoder , 2018, ArXiv.

[14]  Christopher Joseph Pal,et al.  Learning General Purpose Distributed Sentence Representations via Large Scale Multi-task Learning , 2018, ICLR.

[15]  Claire Cardie,et al.  Annotating Expressions of Opinions and Emotions in Language , 2005, Lang. Resour. Evaluation.

[16]  Han Zhao,et al.  Self-Adaptive Hierarchical Sentence Model , 2015, IJCAI.

[17]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[18]  Daniel Jurafsky,et al.  Distant supervision for relation extraction without labeled data , 2009, ACL.

[19]  Jaewoo Kang,et al.  BioBERT: a pre-trained biomedical language representation model for biomedical text mining , 2019, Bioinform..

[20]  Bing Liu,et al.  Mining and summarizing customer reviews , 2004, KDD.

[21]  Jürgen Schmidhuber,et al.  Learning to Forget: Continual Prediction with LSTM , 2000, Neural Computation.

[22]  A. Ng Feature selection, L1 vs. L2 regularization, and rotational invariance , 2004, Twenty-first international conference on Machine learning - ICML '04.

[23]  Zhiyong Lu,et al.  Challenges in clinical natural language processing for automated disorder normalization , 2015, J. Biomed. Informatics.

[24]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[25]  Marco Marelli,et al.  A SICK cure for the evaluation of compositional distributional semantic models , 2014, LREC.

[26]  Sandeep Kumar,et al.  Learning Semantic Sentence Embeddings using Sequential Pair-wise Discriminator , 2018, COLING.

[27]  Ellen M. Voorhees,et al.  Building a question answering test collection , 2000, SIGIR '00.

[28]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[29]  Chris Quirk,et al.  Unsupervised Construction of Large Paraphrase Corpora: Exploiting Massively Parallel News Sources , 2004, COLING.

[30]  Jianfeng Gao,et al.  A Human Generated MAchine Reading COmprehension Dataset , 2018 .

[31]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[32]  Danushka Bollegala,et al.  Frustratingly Easy Meta-Embedding - Computing Meta-Embeddings by Averaging Source Word Embeddings , 2018, NAACL-HLT.

[33]  Quoc V. Le,et al.  Distributed Representations of Sentences and Documents , 2014, ICML.

[34]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[35]  Sanjeev Arora,et al.  A Simple but Tough-to-Beat Baseline for Sentence Embeddings , 2017, ICLR.

[36]  Roland Vollgraf,et al.  An LSTM-Based Dynamic Customer Model for Fashion Recommendation , 2017, RecTemp@RecSys.

[37]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[38]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[39]  Wenpeng Yin,et al.  Learning Meta-Embeddings by Using Ensembles of Embedding Sets , 2015, 1508.04257.

[40]  Tomas Mikolov,et al.  Enriching Word Vectors with Subword Information , 2016, TACL.

[41]  Sven Laur,et al.  Linear Ensembles of Word Embedding Models , 2017, NODALIDA.

[42]  Jason Baldridge,et al.  Learning Dense Representations for Entity Retrieval , 2019, CoNLL.

[43]  Christopher Potts,et al.  Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank , 2013, EMNLP.

[44]  Inanç Birol,et al.  In-domain Context-aware Token Embeddings Improve Biomedical Named Entity Recognition , 2018, Louhi@EMNLP.

[45]  Alexander Löser,et al.  SECTOR: A Neural Model for Coherent Topic Segmentation and Classification , 2019, TACL.

[46]  Alexander Löser,et al.  Learning Contextualized Document Representations for Healthcare Answer Retrieval , 2020, WWW.

[47]  Alec Radford,et al.  Improving Language Understanding by Generative Pre-Training , 2018 .

[48]  Ilya Sutskever,et al.  Language Models are Unsupervised Multitask Learners , 2019 .

[49]  Yiming Yang,et al.  XLNet: Generalized Autoregressive Pretraining for Language Understanding , 2019, NeurIPS.

[50]  Roi Blanco,et al.  Lightweight Multilingual Entity Extraction and Linking , 2017, WSDM.

[51]  Roland Vollgraf,et al.  Contextual String Embeddings for Sequence Labeling , 2018, COLING.

[52]  Ken-ichi Kawarabayashi,et al.  Think Globally, Embed Locally - Locally Linear Meta-embedding of Words , 2018, IJCAI.

[53]  Christian S. Perone,et al.  Evaluation of sentence embeddings in downstream and linguistic probing tasks , 2018, ArXiv.

[54]  Clement J. McDonald,et al.  What can natural language processing do for clinical decision support? , 2009, J. Biomed. Informatics.

[55]  Alexander Löser,et al.  How Does BERT Answer Questions?: A Layer-Wise Analysis of Transformer Representations , 2019, CIKM.

[56]  Bo Pang,et al.  A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts , 2004, ACL.

[57]  Alexander A. Morgan,et al.  Gene name identification and normalization using a model organism database , 2004, J. Biomed. Informatics.

[58]  Virginia R. de Sa,et al.  Improving Sentence Representations with Multi-view Frameworks , 2018, ArXiv.

[59]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[60]  Douwe Kiela,et al.  SentEval: An Evaluation Toolkit for Universal Sentence Representations , 2018, LREC.

[61]  Jason Weston,et al.  Towards AI-Complete Question Answering: A Set of Prerequisite Toy Tasks , 2015, ICLR.

[62]  Ciprian Chelba Statistical Language Modeling , 2010 .

[63]  Alex Graves,et al.  Supervised Sequence Labelling with Recurrent Neural Networks , 2012, Studies in Computational Intelligence.