Identification Semi-Automatique de Mots-Germes pour l'Analyse de Sentiments et son Intensité

Dans le but d'exploiter les opinions dans les tweets, cet article presente une classification a partir du sentiment contenu au sein des tweets. Nous presentons une methode d'identifi-cation de nouveaux mots-germes. Ils sont utilises pour la prediction de l'intensite de sentiments des mots en co-occurrence avec ces mots-germes. Ensuite, le calcul de similarites entre sentiments est applique en utilisant: la mesure de la similarite entre deux mots et l'utilisation de plongement de mots (e.g. word2vec, GloVE) couple a la mesure cosinus. Les resultats montrent l'importance de l'utilisation de mots-germes adaptes aux tweets, ainsi que la taille et le pretrai-tement de corpus. Pour conclure, nous avons obtenu les meilleurs resultats grâce a l'application de la methode utilisant le plongement de mots couplee a la mesure cosinus. ABSTRACT. For the purpose of opinion exploring in tweets, this article presents a sentiment classification of tweets content. First, we present a method to identify new sentiment similarity seed words. These seed words are used for predicting sentiment intensity of other words and short phrases in co-occurrence. Then, for testing sentiment similarity, we use: Similarity Measures methods between words and cosine similarity measure between the word embedding representations (e.g. word2vec, GloVE). The experiments results highlight the importance of adapted for tweets seed words. In addition of the corpora size and its pre-treatement. As a conclusion, best results were achieved using cosine similarity measure between the word embedding representations. MOTS-CLES : Mots-germes, Twitter, Mesure de la Similarite, Plongement de mot, Word2vec, GloVe.

[1]  P. Jaccard,et al.  Etude comparative de la distribution florale dans une portion des Alpes et des Jura , 1901 .

[2]  Ming Zhou,et al.  Learning Sentiment-Specific Word Embedding for Twitter Sentiment Classification , 2014, ACL.

[3]  Christopher Potts,et al.  Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank , 2013, EMNLP.

[4]  Yan Su,et al.  Dual word and document seed selection for semi-supervised sentiment classification , 2012, CIKM '12.

[5]  Sasha Blair-Goldensohn,et al.  The viability of web-derived polarity lexicons , 2010, NAACL.

[6]  Peter D. Turney Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews , 2002, ACL.

[7]  Cícero Nogueira dos Santos,et al.  Deep Convolutional Neural Networks for Sentiment Analysis of Short Texts , 2014, COLING.

[8]  Hiroshi Kanayama,et al.  Fully Automatic Lexicon Expansion for Domain-oriented Sentiment Analysis , 2006, EMNLP.

[9]  Jason Weston,et al.  Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..

[10]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[11]  Mike Thelwall,et al.  Sentiment in short strength detection informal text , 2010 .

[12]  Ladislav Lenc,et al.  UWB at SemEval-2016 Task 7: Novel Method for Automatic Sentiment Intensity Determination , 2016, SemEval@NAACL-HLT.

[13]  Vaibhavi N Patodkar,et al.  Twitter as a Corpus for Sentiment Analysis and Opinion Mining , 2016 .

[14]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[15]  Maite Taboada,et al.  Lexicon-Based Methods for Sentiment Analysis , 2011, CL.

[16]  Wesley De Neve,et al.  Multimedia Lab @ ACL WNUT NER Shared Task: Named Entity Recognition for Twitter Microposts using Distributed Word Representations , 2015, NUT@IJCNLP.

[17]  Masaru Kitsuregawa,et al.  Building Lexicon for Sentiment Analysis from Massive Collection of HTML Documents , 2007, EMNLP.

[18]  Michael L. Littman,et al.  Measuring praise and criticism: Inference of semantic orientation from association , 2003, TOIS.

[19]  Saif Mohammad,et al.  NRC-Canada-2014: Detecting Aspects and Sentiment in Customer Reviews , 2014, *SEMEVAL.

[20]  Evgeniy Gabrilovich,et al.  Computing Semantic Relatedness Using Wikipedia-based Explicit Semantic Analysis , 2007, IJCAI.