SentiHealth: creating health-related sentiment lexicon using hybrid approach

The exponential increase in the health-related online reviews has played a pivotal role in the development of sentiment analysis systems for extracting and analyzing user-generated health reviews about a drug or medication. The existing general purpose opinion lexicons, such as SentiWordNet has a limited coverage of health-related terms, creating problems for the development of health-based sentiment analysis applications. In this work, we present a hybrid approach to create health-related domain specific lexicon for the efficient classification and scoring of health-related users’ sentiments. The proposed approach is based on the bootstrapping modal, a dataset of health reviews, and corpus-based sentiment detection and scoring. In each of the iteration, vocabulary of the lexicon is updated automatically from an initial seed cache, irrelevant words are filtered, words are declared as medical or non-medical entries, and finally sentiment class and score is assigned to each of the word. The results obtained demonstrate the efficacy of the proposed technique.

[1]  M. Asghar Detection and Scoring of Internet Slangs for Sentiment Analysis Using SentiWordNet , 2014 .

[2]  Sivaji Bandyopadhyay,et al.  Topic-Based Bengali Opinion Summarization , 2010, COLING.

[3]  Mike Thelwall,et al.  A Study of Information Retrieval Weighting Schemes for Sentiment Analysis , 2010, ACL.

[4]  Lei Zhang,et al.  Sentiment Analysis and Opinion Mining , 2017, Encyclopedia of Machine Learning and Data Mining.

[5]  Fazal Masud Kundi,et al.  Medical opinion lexicon: an incremental model for mining health reviews , 2014 .

[6]  Diana Inkpen,et al.  Second Order Co-occurrence PMI for Determining the Semantic Similarity of Words , 2006, LREC.

[7]  George A. Miller,et al.  Introduction to WordNet: An On-line Lexical Database , 1990 .

[8]  Christopher S. G. Khoo,et al.  Sentiment lexicons for health-related opinion mining , 2012, IHI '12.

[9]  Ted Pedersen,et al.  UMLS-Interface and UMLS-Similarity : Open Source Software for Measuring Paths and Semantic Similarity , 2009, AMIA.

[10]  Carlo Strapparava,et al.  WordNet Affect: an Affective Extension of WordNet , 2004, LREC.

[11]  Ahmet Aker,et al.  Summarizing Online Reviews Using Aspect Rating Distributions and Language Modeling , 2013, IEEE Intelligent Systems.

[12]  Olivier Bodenreider,et al.  The Unified Medical Language System (UMLS): integrating biomedical terminology , 2004, Nucleic Acids Res..

[13]  Dongmei Zhang,et al.  Opinion summarization of customer reviews , 2012 .

[14]  Muhammad Zubair Asghar,et al.  Lexicon-Based Sentiment Analysis in the Social Web , 2014 .

[15]  Claire Cardie,et al.  Adapting a Polarity Lexicon using Integer Linear Programming for Domain-Specific Sentiment Classification , 2009, EMNLP.

[16]  L. Engelen,et al.  Definition of Health 2.0 and Medicine 2.0: A Systematic Review , 2010, Journal of medical Internet research.

[17]  Simone Teufel,et al.  A Bootstrapping Approach to Unsupervised Detection of Cue Phrase Variants , 2006, ACL.

[18]  Andrea Esuli,et al.  Multi-Faceted Rating of Product Reviews , 2009, ERCIM News.

[19]  Timothy W. Finin,et al.  Delta TFIDF: An Improved Feature Space for Sentiment Analysis , 2009, ICWSM.

[20]  Rada Mihalcea,et al.  Using WordNet and Lexical Operators to Improve Internet Searches , 2000, IEEE Internet Comput..

[21]  Ewan Klein,et al.  Natural Language Processing with Python , 2009 .

[22]  Sanjeev Gupta,et al.  Patient 2.0 Empowerment , 2008, SWWS.

[23]  Muhammad Zubair Asghar,et al.  A Unified Framework for Creating Domain Dependent Polarity Lexicons from User Generated Reviews , 2015, PloS one.

[24]  Sasha Blair-Goldensohn,et al.  The viability of web-derived polarity lexicons , 2010, NAACL.

[25]  Ion Smeureanu,et al.  Applying Supervised Opinion Mining Techniques on Online User Reviews , 2012 .

[26]  Tat-Seng Chua,et al.  A Public Reference Implementation of the RAP Anaphora Resolution Algorithm , 2004, LREC.

[27]  Shakeel Ahmad,et al.  Sentiment Analysis on YouTube: A Brief Survey , 2015, ArXiv.

[28]  Pablo Gervás,et al.  SentiSense: An easily scalable concept-based affective lexicon for sentiment analysis , 2012, LREC.

[29]  Muhammad Zubair Asghar,et al.  Context-Aware Spelling Corrector for Sentiment Analysis , 2014 .

[30]  Philip J. Stone,et al.  A computer approach to content analysis: studies using the General Inquirer system , 1963, AFIPS Spring Joint Computing Conference.

[31]  Khan Aurangzeb,et al.  A Review of Text Summarization , 2015 .

[32]  Min-Yen Kan,et al.  Product review summarization from a deeper perspective , 2011, JCDL '11.

[33]  Bo Pang,et al.  A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts , 2004, ACL.

[34]  Bing Liu,et al.  Opinion Mining and Sentiment Analysis , 2011 .

[35]  Ebrahim Randeree,et al.  Exploring technology impacts of Healthcare 2.0 initiatives. , 2009, Telemedicine journal and e-health : the official journal of the American Telemedicine Association.

[36]  Jacques Savoy Data Fusion for Effective European Monolingual Information Retrieval , 2004, CLEF.

[37]  Dan Klein,et al.  Accurate Unlexicalized Parsing , 2003, ACL.

[38]  Wei Gao,et al.  Build Emotion Lexicon from Microblogs by Combining Effects of Seed Words and Emoticons in a Heterogeneous Graph , 2015, HT.

[39]  Yücel Saygin,et al.  Learning Domain-Specific Polarity Lexicons , 2012, 2012 IEEE 12th International Conference on Data Mining Workshops.