A Comprehensive Survey on Word Representation Models: From Classical to State-of-the-Art Word Representation Language Models

Word representation has always been an important research area in the history of natural language processing (NLP). Understanding such complex text data is imperative, given that it is rich in information and can be used widely across various applications. In this survey, we explore different word representation models and its power of expression, from the classical to modern-day state-of-the-art word representation language models (LMS). We describe a variety of text representation methods, and model designs have blossomed in the context of NLP, including SOTA LMs. These models can transform large volumes of text into effective vector representations capturing the same semantic information. Further, such representations can be utilized by various machine learning (ML) algorithms for a variety of NLP related tasks. In the end, this survey briefly discusses the commonly used ML and DL based classifiers, evaluation metrics and the applications of these word embeddings in different NLP tasks.

[1]  Yutaka Matsuo,et al.  Deep contextualized word representations for detecting sarcasm and irony , 2018, WASSA@EMNLP.

[2]  Charu C. Aggarwal,et al.  A Survey of Text Classification Algorithms , 2012, Mining Text Data.

[3]  Lijuan Wang,et al.  The Role of Pre-processing in Twitter Sentiment Analysis , 2014, ICIC.

[4]  Usman Naseem,et al.  Abusive Language Detection: A Comprehensive Review , 2019 .

[5]  Boi Faltings,et al.  A :) Is Worth a Thousand Words: How People Attach Sentiment to Emoticons and Words in Tweets , 2013, 2013 International Conference on Social Computing.

[6]  Sanja Fidler,et al.  Aligning Books and Movies: Towards Story-Like Visual Explanations by Watching Movies and Reading Books , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[7]  Ibrahim A. Hameed,et al.  Deep Context-Aware Embedding for Abusive and Hate Speech detection on Twitter , 2019, Aust. J. Intell. Inf. Process. Syst..

[8]  Lluís Màrquez i Villodre,et al.  Boosting Trees for Anti-Spam Email Filtering , 2001, ArXiv.

[9]  Klemens Böhm,et al.  Toward meaningful notions of similarity in NLP embedding models , 2018, International Journal on Digital Libraries.

[10]  Ibrahim A. Hameed,et al.  Deep AutoEncoder-Decoder Framework for Semantic Segmentation of Brain Tumor , 2019, Aust. J. Intell. Inf. Process. Syst..

[11]  Ruslan Salakhutdinov,et al.  A Comparative Study of Word Embeddings for Reading Comprehension , 2017, ArXiv.

[12]  Xu Tan,et al.  MASS: Masked Sequence to Sequence Pre-training for Language Generation , 2019, ICML.

[13]  Ray R. Larson Introduction to Information Retrieval , 2010 .

[14]  Mohamed S. Kamel,et al.  Efficient phrase-based document indexing for Web document clustering , 2004, IEEE Transactions on Knowledge and Data Engineering.

[15]  Xiaohui Song,et al.  Improved Bayes Method Based on TF-IDF Feature and Grade Factor Feature for Chinese Information Classification , 2018, 2018 IEEE International Conference on Big Data and Smart Computing (BigComp).

[16]  Erik Cambria,et al.  SenticNet 5: Discovering Conceptual Primitives for Sentiment Analysis by Means of Context Embeddings , 2018, AAAI.

[17]  Dat Quoc Nguyen,et al.  BERTweet: A pre-trained language model for English Tweets , 2020, EMNLP.

[18]  Pierre-François Marteau,et al.  Intrusion detection in network systems through hybrid supervised and unsupervised mining process- a detailed case study on the ISCX benchmark dataset - , 2017 .

[19]  Ignacio Iacobacci,et al.  SensEmbed: Learning Sense Embeddings for Word and Relational Similarity , 2015, ACL.

[20]  Shuhua Liu,et al.  Combining N-gram based Similarity Analysis with Sentiment Analysis in Web Content Classification , 2014, KDIR.

[21]  Josef van Genabith,et al.  From News to Comment: Resources and Benchmarks for Parsing the Language of Web 2.0 , 2011, IJCNLP.

[22]  Laizhong Cui,et al.  Combine HowNet lexicon to train phrase recursive autoencoder for sentence-level sentiment analysis , 2017, Neurocomputing.

[23]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[24]  Ye Zhang,et al.  A Sensitivity Analysis of (and Practitioners’ Guide to) Convolutional Neural Networks for Sentence Classification , 2015, IJCNLP.

[25]  Bo Pang,et al.  Thumbs up? Sentiment Classification using Machine Learning Techniques , 2002, EMNLP.

[26]  Karen Spärck Jones A statistical interpretation of term specificity and its application in retrieval , 2021, J. Documentation.

[27]  Jun Guo,et al.  An empirical convolutional neural network approach for semantic relation classification , 2016, Neurocomputing.

[28]  Katarzyna Musial,et al.  Transformer based Deep Intelligent Contextual Embedding for Twitter sentiment analysis , 2020, Future Gener. Comput. Syst..

[29]  Christopher D. Manning,et al.  Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..

[30]  Matloob Khushi,et al.  Diabetic Retinopathy Detection Using Multi-layer Neural Networks and Split Attention with Focal Loss , 2020, ICONIP.

[31]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[32]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[33]  Katarzyna Musial,et al.  Biomedical Named-Entity Recognition by Hierarchically Fusing BioBERT Representations and Deep Contextual-Level Word-Embedding , 2020, 2020 International Joint Conference on Neural Networks (IJCNN).

[34]  Andrew Zisserman,et al.  Reading Text in the Wild with Convolutional Neural Networks , 2014, International Journal of Computer Vision.

[35]  Tomas Mikolov,et al.  Enriching Word Vectors with Subword Information , 2016, TACL.

[36]  Norisma Idris,et al.  Toward Tweets Normalization Using Maximum Entropy , 2015, NUT@IJCNLP.

[37]  B. Pradhan,et al.  A comparative study of logistic model tree, random forest, and classification and regression tree models for spatial prediction of landslide susceptibility , 2017 .

[38]  Edel Greevy,et al.  Automatic text categorisation of racist webpages , 2004 .

[39]  Tong Zhang,et al.  Effective Use of Word Order for Text Categorization with Convolutional Neural Networks , 2014, NAACL.

[40]  Omer Levy,et al.  SpanBERT: Improving Pre-training by Representing and Predicting Spans , 2019, TACL.

[41]  Zhao Jianqiang,et al.  Pre-processing Boosting Twitter Sentiment Analysis? , 2015, 2015 IEEE International Conference on Smart City/SocialCom/SustainCom (SmartCity).

[42]  Jason Weston,et al.  A unified architecture for natural language processing: deep neural networks with multitask learning , 2008, ICML '08.

[43]  Donald E. Brown,et al.  Text Classification Algorithms: A Survey , 2019, Inf..

[44]  Katarzyna Musial,et al.  DICE: Deep Intelligent Contextual Embedding for Twitter Sentiment Analysis , 2019, 2019 International Conference on Document Analysis and Recognition (ICDAR).

[45]  Rob Malouf,et al.  A Preliminary Investigation into Sentiment Analysis of Informal Political Discourse , 2006, AAAI Spring Symposium: Computational Approaches to Analyzing Weblogs.

[46]  Hao Tian,et al.  ERNIE 2.0: A Continual Pre-training Framework for Language Understanding , 2019, AAAI.

[47]  Catherine Havasi,et al.  ConceptNet 5.5: An Open Multilingual Graph of General Knowledge , 2016, AAAI.

[48]  Barbara Plank,et al.  Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies , 2011 .

[49]  Steve Young,et al.  Semantic Specialization of Distributional Word Vector Spaces using Monolingual and Cross-Lingual Constraints , 2017 .

[50]  Jacob Eisenstein,et al.  Mimicking Word Embeddings using Subword RNNs , 2017, EMNLP.

[51]  Vivek Narayanan,et al.  Fast and Accurate Sentiment Classification Using an Enhanced Naive Bayes Model , 2013, IDEAL.

[52]  Patrick Paroubek,et al.  Twitter as a Corpus for Sentiment Analysis and Opinion Mining , 2010, LREC.

[53]  Tomas Mikolov,et al.  Bag of Tricks for Efficient Text Classification , 2016, EACL.

[54]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[55]  Ali Ghodsi,et al.  Improving the Accuracy of Pre-trained Word Embeddings for Sentiment Analysis , 2017, ArXiv.

[56]  Razvan Pascanu,et al.  On the difficulty of training recurrent neural networks , 2012, ICML.

[57]  Tin Kam Ho,et al.  The Random Subspace Method for Constructing Decision Forests , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[58]  B. M. Hill,et al.  Posterior Distribution of Percentiles: Bayes' Theorem for Sampling From a Population , 1968 .

[59]  Harith Alani,et al.  Evaluation Datasets for Twitter Sentiment Analysis: A survey and a new dataset, the STS-Gold , 2013, ESSEM@AI*IA.

[60]  Yoshua Bengio,et al.  Learning long-term dependencies with gradient descent is difficult , 1994, IEEE Trans. Neural Networks.

[61]  Jason Weston,et al.  Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..

[62]  Quoc V. Le,et al.  ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators , 2020, ICLR.

[63]  Imran Razzak,et al.  A survey of pre-processing techniques to improve short-text quality: a case study on hate speech detection on twitter , 2020, Multimedia Tools and Applications.

[64]  Matloob Khushi,et al.  BioALBERT: A Simple and Effective Pre-trained Language Model for Biomedical Named Entity Recognition , 2020, 2021 International Joint Conference on Neural Networks (IJCNN).

[65]  Jason Weston,et al.  StarSpace: Embed All The Things! , 2017, AAAI.

[66]  Arzucan Özgür,et al.  Segmenting Hashtags using Automatically Created Training Data , 2016, LREC.

[67]  Jorge A. Balazs,et al.  Opinion Mining and Information Fusion: A survey , 2016, Inf. Fusion.

[68]  Peter Willett,et al.  Document Retrieval Systems , 1988 .

[69]  Padmini Srinivasan,et al.  Exploring Feature Definition and Selection for Sentiment Classifiers , 2011, ICWSM.

[70]  Yong Shi,et al.  The Role of Text Pre-processing in Sentiment Analysis , 2013, ITQM.

[71]  Amit P. Sheth,et al.  Harnessing Twitter "Big Data" for Automatic Emotion Identification , 2012, 2012 International Conference on Privacy, Security, Risk and Trust and 2012 International Confernece on Social Computing.

[72]  Henda Hajjami Ben Ghézala,et al.  Comparative study of word embedding methods in topic segmentation , 2017, KES.

[73]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[74]  Harith Alani,et al.  Automatically Extracting Polarity-Bearing Topics for Cross-Domain Sentiment Classification , 2011, ACL.

[75]  Roberto Navigli,et al.  Nasari: Integrating explicit knowledge and corpus statistics for a multilingual representation of concepts and entities , 2016, Artif. Intell..

[76]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[77]  Xin Liu,et al.  Towards an aggregator that exploits big data to bid on frequency containment reserve market , 2017, IECON 2017 - 43rd Annual Conference of the IEEE Industrial Electronics Society.

[78]  Christopher Potts,et al.  Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank , 2013, EMNLP.

[79]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[80]  Marcel Salathé,et al.  COVID-Twitter-BERT: A natural language processing model to analyse COVID-19 content on Twitter , 2020, Frontiers in Artificial Intelligence.

[81]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[82]  Maya R. Gupta,et al.  Deep Lattice Networks and Partial Monotonic Functions , 2017, NIPS.

[83]  Pascal Vincent,et al.  Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[84]  Gurpreet Singh Lehal,et al.  A Survey of Text Mining Techniques and Applications , 2009 .

[85]  V Korde,et al.  TEXT CLASSIFICATION AND CLASSIFIERS: A SURVEY , 2012 .

[86]  Yiming Yang,et al.  XLNet: Generalized Autoregressive Pretraining for Language Understanding , 2019, NeurIPS.

[87]  Ah-Hwee Tan,et al.  Text Mining: The state of the art and the challenges , 2000 .

[88]  Iz Beltagy,et al.  SciBERT: A Pretrained Language Model for Scientific Text , 2019, EMNLP.

[89]  Li Zhao,et al.  Attention-based LSTM for Aspect-level Sentiment Classification , 2016, EMNLP.

[90]  Serkan Günal,et al.  The impact of preprocessing on text classification , 2014, Inf. Process. Manag..

[91]  Charu C. Aggarwal,et al.  Data Clustering , 2013 .

[92]  Pavlos Protopapas,et al.  NONPARAMETRIC BAYESIAN ESTIMATION OF PERIODIC LIGHT CURVES , 2011, 1111.1315.

[93]  Ingmar Weber,et al.  Automated Hate Speech Detection and the Problem of Offensive Language , 2017, ICWSM.

[94]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[95]  Johanna D. Moore,et al.  Twitter Sentiment Analysis: The Good the Bad and the OMG! , 2011, ICWSM.

[96]  Andreas Hotho,et al.  Learning Semantic Relatedness From Human Feedback Using Metric Learning , 2017, ArXiv.

[97]  Prem Melville,et al.  Sentiment analysis of blogs by combining lexical knowledge with text classification , 2009, KDD.

[98]  Saif Mohammad,et al.  NRC-Canada: Building the State-of-the-Art in Sentiment Analysis of Tweets , 2013, *SEMEVAL.

[99]  Geoffrey E. Hinton,et al.  Generating Text with Recurrent Neural Networks , 2011, ICML.

[100]  Eric P. Xing,et al.  Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , 2014, ACL 2014.

[101]  Yue Zhang,et al.  Context-Sensitive Twitter Sentiment Classification Using Neural Network , 2016, AAAI.

[102]  Ilya Sutskever,et al.  Language Models are Unsupervised Multitask Learners , 2019 .

[103]  Mariano Sigman,et al.  Comparative study of LSA vs Word2vec embeddings in small corpora: a case study in dreams database , 2016, ArXiv.

[104]  Yoshua Bengio,et al.  Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling , 2014, ArXiv.

[105]  Xuejie Zhang,et al.  Refining Word Embeddings Using Intensity Scores for Sentiment Analysis , 2018, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[106]  Marco Baroni,et al.  High-risk learning: acquiring new word vectors from tiny data , 2017, EMNLP.

[107]  Lav R. Varshney,et al.  CTRL: A Conditional Transformer Language Model for Controllable Generation , 2019, ArXiv.

[108]  J. Morgan,et al.  Problems in the Analysis of Survey Data, and a Proposal , 1963 .

[109]  Usman Qamar,et al.  TOM: Twitter opinion mining framework using hybrid classification scheme , 2014, Decis. Support Syst..

[110]  Brendan T. O'Connor,et al.  Part-of-Speech Tagging for Twitter: Annotation, Features, and Experiments , 2010, ACL.

[111]  Yoshua Bengio,et al.  A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[112]  Tajinder Singh,et al.  Role of Text Pre-processing in Twitter Sentiment Analysis , 2016 .

[113]  Luo Si,et al.  StructBERT: Incorporating Language Structures into Pre-training for Deep Language Understanding , 2019, ICLR.

[114]  Roberto Basili,et al.  Acquiring a Large Scale Polarity Lexicon Through Unsupervised Distributional Methods , 2015, NLDB.

[115]  Ming Zhou,et al.  A Joint Segmentation and Classification Framework for Sentence Level Sentiment Classification , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[116]  J. Ross Quinlan,et al.  Simplifying Decision Trees , 1987, Int. J. Man Mach. Stud..

[117]  Xuanjing Huang,et al.  Learning Context-Sensitive Word Embeddings with Neural Tensor Skip-Gram Model , 2015, IJCAI.

[118]  Danilo P. Mandic,et al.  Recurrent Neural Networks for Prediction: Learning Algorithms, Architectures and Stability , 2001 .

[119]  Peter W. Eklund,et al.  COVIDSenti: A Large-Scale Benchmark Twitter Data Set for COVID-19 Sentiment Analysis , 2021, IEEE Transactions on Computational Social Systems.

[120]  Yoon Kim,et al.  Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[121]  Adam Tauman Kalai,et al.  Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings , 2016, NIPS.

[122]  Katarzyna Musial,et al.  Towards Improved Deep Contextual Embedding for the identification of Irony and Sarcasm , 2020, 2020 International Joint Conference on Neural Networks (IJCNN).

[123]  Guillaume Lample,et al.  Cross-lingual Language Model Pretraining , 2019, NeurIPS.

[124]  Adrian Grozavu,et al.  A Comparative Analysis of Binary Logistic Regression and Analytical Hierarchy Process for Landslide Susceptibility Assessment in the Dobrov River Basin, Romania , 2016 .

[125]  Alessandro Moschitti,et al.  Twitter Sentiment Analysis with Deep Convolutional Neural Networks , 2015, SIGIR.

[126]  Thomas Wolf,et al.  DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter , 2019, ArXiv.

[127]  Jaewoo Kang,et al.  BioBERT: a pre-trained biomedical language representation model for biomedical text mining , 2019, Bioinform..

[128]  David M. Magerman Statistical Decision-Tree Models for Parsing , 1995, ACL.

[129]  Omer Levy,et al.  BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension , 2019, ACL.

[130]  Yuzhou Wang,et al.  Locate the Hate: Detecting Tweets against Blacks , 2013, AAAI.

[131]  Matloob Khushi,et al.  A Comparative Analysis of Active Learning for Biomedical Text Mining , 2021, Applied System Innovation.

[132]  Usman Naseem,et al.  Hybrid Words Representation for Airlines Sentiment Analysis , 2019, Australasian Conference on Artificial Intelligence.

[133]  Ming Zhou,et al.  Learning Sentiment-Specific Word Embedding for Twitter Sentiment Classification , 2014, ACL.

[134]  A. Smeaton,et al.  On Using Twitter to Monitor Political Sentiment and Predict Election Results , 2011 .

[135]  Sven Behnke,et al.  Evaluation of Pooling Operations in Convolutional Architectures for Object Recognition , 2010, ICANN.

[136]  Mohammad Shoeybi,et al.  Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism , 2019, ArXiv.

[137]  Andrew McCallum,et al.  Efficient Non-parametric Estimation of Multiple Embeddings per Word in Vector Space , 2014, EMNLP.

[138]  Gui Xiaolin,et al.  Comparison Research on Text Pre-processing Methods on Twitter Sentiment Analysis , 2017, IEEE Access.

[139]  Matloob Khushi,et al.  Text Mining of Stocktwits Data for Predicting Stock Prices , 2021, Applied System Innovation.

[140]  Bingquan Liu,et al.  Modelling context with neural networks for recommending idioms in essay writing , 2018, Neurocomputing.

[141]  Nancy Ide,et al.  Distant Supervision for Emotion Classification with Discrete Binary Values , 2013, CICLing.

[142]  Kevin Gimpel,et al.  ALBERT: A Lite BERT for Self-supervised Learning of Language Representations , 2019, ICLR.

[143]  Ming Zhou,et al.  Sentiment Embeddings with Applications to Sentiment Analysis , 2016, IEEE Transactions on Knowledge and Data Engineering.

[144]  Amaury Lendasse,et al.  Discriminant document embeddings with an extreme learning machine for classifying clinical narratives , 2018, Neurocomputing.

[145]  Huan Liu,et al.  Text Analytics in Social Media , 2012, Mining Text Data.

[146]  Julio Gonzalo,et al.  Sentiment Propagation for Predicting Reputation Polarity , 2017, ECIR.

[147]  Xiaodong Liu,et al.  Unified Language Model Pre-training for Natural Language Understanding and Generation , 2019, NeurIPS.

[148]  Colin Raffel,et al.  Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..

[149]  Cícero Nogueira dos Santos,et al.  Deep Convolutional Neural Networks for Sentiment Analysis of Short Texts , 2014, COLING.

[150]  Xuejun Zhang,et al.  Deep Convolution Neural Networks for Twitter Sentiment Analysis , 2018, IEEE Access.

[151]  Ido Dagan,et al.  context2vec: Learning Generic Context Embedding with Bidirectional LSTM , 2016, CoNLL.

[152]  Gui Xiaolin,et al.  Deep Convolution Neural Networks for Twitter Sentiment Analysis , 2018, IEEE Access.

[153]  Avi Arampatzis,et al.  A comparative evaluation of pre-processing techniques and their interactions for twitter sentiment analysis , 2018, Expert Syst. Appl..

[154]  Yu Sun,et al.  ERNIE: Enhanced Representation through Knowledge Integration , 2019, ArXiv.

[155]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[156]  Owen Rambow,et al.  Sentiment Analysis of Twitter Data , 2011 .

[157]  Sebastian Ruder,et al.  Universal Language Model Fine-tuning for Text Classification , 2018, ACL.

[158]  Richard Socher,et al.  Learned in Translation: Contextualized Word Vectors , 2017, NIPS.

[159]  Omer Levy,et al.  RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.

[160]  Alexandra Balahur,et al.  Sentiment Analysis in Social Media Texts , 2013, WASSA@NAACL-HLT.

[161]  Ikuya Yamada,et al.  Enhancing Named Entity Recognition in Twitter Messages Using Entity Linking , 2015, NUT@IJCNLP.

[162]  Quoc V. Le,et al.  Distributed Representations of Sentences and Documents , 2014, ICML.

[163]  David Madigan,et al.  Large-Scale Bayesian Logistic Regression for Text Categorization , 2007, Technometrics.

[164]  Mihai Surdeanu,et al.  The Stanford CoreNLP Natural Language Processing Toolkit , 2014, ACL.