Learning Word Embeddings with Chi-Square Weights for Healthcare Tweet Classification

Twitter is a popular source for the monitoring of healthcare information and public disease. However, there exists much noise in the tweets. Even though appropriate keywords appear in the tweets, they do not guarantee the identification of a truly health-related tweet. Thus, the traditional keyword-based classification task is largely ineffective. Algorithms for word embeddings have proved to be useful in many natural language processing (NLP) tasks. We introduce two algorithms based on an existing word embedding learning algorithm: the continuous bag-of-words model (CBOW). We apply the proposed algorithms to the task of recognizing healthcare-related tweets. In the CBOW model, the vector representation of words is learned from their contexts. To simplify the computation, the context is represented by an average of all words inside the context window. However, not all words in the context window contribute equally to the prediction of the target word. Greedily incorporating all the words in the context window will largely limit the contribution of the useful semantic words and bring noisy or irrelevant words into the learning process, while existing word embedding algorithms also try to learn a weighted CBOW model. Their weights are based on existing pre-defined syntactic rules while ignoring the task of the learned embedding. We propose learning weights based on the words’ relative importance in the classification task. Our intuition is that such learned weights place more emphasis on words that have comparatively more to contribute to the later task. We evaluate the embeddings learned from our algorithms on two healthcare-related datasets. The experimental results demonstrate that embeddings learned from the proposed algorithms outperform existing techniques by a relative accuracy improvement of over 9%.

[1]  Mirella Lapata,et al.  Dependency-Based Construction of Semantic Space Models , 2007, CL.

[2]  Steven Skiena,et al.  The Expressive Power of Word Embeddings , 2013, ArXiv.

[3]  Ziqi Zhang,et al.  A Comparative Evaluation of Term Recognition Algorithms , 2008, LREC.

[4]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[5]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[6]  Peter D. Turney Similarity of Semantic Relations , 2006, CL.

[7]  J. R. Firth,et al.  A Synopsis of Linguistic Theory, 1930-1955 , 1957 .

[8]  Mark Dredze,et al.  You Are What You Tweet: Analyzing Twitter for Public Health , 2011, ICWSM.

[9]  Mark Dredze,et al.  Examining Patterns of Influenza Vaccination in Social Media , 2017, AAAI Workshops.

[10]  Stephen Clark,et al.  Vector Space Models of Lexical Meaning , 2015 .

[11]  John Liu,et al.  sense2vec - A Fast and Accurate Method for Word Sense Disambiguation In Neural Word Embeddings , 2015, ArXiv.

[12]  Enhong Chen,et al.  A Probabilistic Model for Learning Multi-Prototype Word Embeddings , 2014, COLING.

[13]  Zellig S. Harris,et al.  Distributional Structure , 1954 .

[14]  Mihai Surdeanu,et al.  The Stanford CoreNLP Natural Language Processing Toolkit , 2014, ACL.

[15]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[16]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[17]  Yu Hu,et al.  Part-of-Speech Relevance Weights for Learning Word Embeddings , 2016, ArXiv.

[18]  T. Landauer,et al.  A Solution to Plato's Problem: The Latent Semantic Analysis Theory of Acquisition, Induction, and Representation of Knowledge. , 1997 .

[19]  Mark Dredze,et al.  Separating Fact from Fear: Tracking Flu Infections on Twitter , 2013, NAACL.

[20]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[21]  Yoshua Bengio,et al.  A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[22]  Pablo Gamallo Comparing explicit and predictive distributional semantic models endowed with syntactic contexts , 2017, Lang. Resour. Evaluation.

[23]  Ehud Rivlin,et al.  Placing search in context: the concept revisited , 2002, TOIS.

[24]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[25]  Declan Butler,et al.  When Google got flu wrong , 2013, Nature.

[26]  Martín Abadi,et al.  TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.

[27]  David Sontag,et al.  Learning Low-Dimensional Representations of Medical Concepts , 2016, CRI.

[28]  David C. Howell,et al.  Chi-Square Test: Analysis of Contingency Tables , 2011, International Encyclopedia of Statistical Science.

[29]  Omer Levy,et al.  Dependency-Based Word Embeddings , 2014, ACL.

[30]  Karl Pearson F.R.S. X. On the criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can be reasonably supposed to have arisen from random sampling , 2009 .

[31]  Ming Zhou,et al.  Learning Sentiment-Specific Word Embedding for Twitter Sentiment Classification , 2014, ACL.

[32]  K. Pearson On the Criterion that a Given System of Deviations from the Probable in the Case of a Correlated System of Variables is Such that it Can be Reasonably Supposed to have Arisen from Random Sampling , 1900 .

[33]  Gilles Louppe,et al.  Independent consultant , 2013 .

[34]  Walid Magdy,et al.  A Tool for Monitoring and Analyzing HealthCare Tweets , 2013 .

[35]  Yong Yu,et al.  Learning Word Representation Considering Proximity and Ambiguity , 2014, AAAI.

[36]  Jason Weston,et al.  Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..

[37]  Yoon Kim,et al.  Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[38]  Stephen E. Robertson,et al.  Understanding inverse document frequency: on theoretical arguments for IDF , 2004, J. Documentation.

[39]  Gene H. Golub,et al.  Singular value decomposition and least squares solutions , 1970, Milestones in Matrix Computation.

[40]  Anthony N. Nguyen,et al.  The Benefits of Word Embeddings Features for Active Learning in Clinical Information Extraction , 2016, ALTA.

[41]  Alberto Maria Segre,et al.  The Use of Twitter to Track Levels of Disease Activity and Public Concern in the U.S. during the Influenza A H1N1 Pandemic , 2011, PloS one.

[42]  Jeremy Ginsberg,et al.  Detecting influenza epidemics using search engine query data , 2009, Nature.

[43]  Markus Brede,et al.  Detecting and Characterizing Eating-Disorder Communities on Social Media , 2017, WSDM.

[44]  Wang Ling,et al.  Two/Too Simple Adaptations of Word2Vec for Syntax Problems , 2015, NAACL.