Détermination de la valence affective de termes dans de grands corpus de textes

The aim of this research is to develop a method of information extraction from large corpora of texts to estimate the affective valence associated to any term. Our approach combines two techniques : latent semantic analysis (LSA) and the determination of the emotional content of a text based on the words that compose it. A preliminary study designed to evaluate this approach has been conducted on a corpus of several thousands of articles published in a Belgian newspaper. A first analysis showed that, by combining LSA and a dictionary of 3000 words, it is possible t o approximate efficiently the affective valence of words on the base of the words that are associated to them in the semantic space. A second analysis applied the technique t o firm names. We conclude by proposing some improvements of the technique.

[1]  C. Osgood,et al.  The Measurement of Meaning , 1958 .

[2]  C. Staats,et al.  Meaning established by classical conditioning. , 1957, Journal of experimental psychology.

[3]  C. Osgood Studies on the generality of affective meaning systems. , 1962 .

[4]  D. R. Heise,et al.  Semantic di erential profiles for 1000 most frequent English words , 1965 .

[5]  D. Armor Theta Reliability and Factor Scaling , 1973 .

[6]  C. W. Anderson,et al.  Computer assisted modeling of affective tone in written documents , 1982, Comput. Humanit..

[7]  C. W. Anderson,et al.  Modeling emotional tone in stories using tension levels and categorical states , 1986, Comput. Humanit..

[8]  C. Whissell,et al.  A Dictionary of Affect in Language: IV. Reliability, Validity, and Applications , 1986 .

[9]  On the Thread of Discourse: Homogeneity, Trends, and Rhythms in Texts , 1989 .

[10]  J. Morais,et al.  Valeur affective de 904 mots de la langue française , 1989 .

[11]  C. W. Anderson,et al.  Quantification of rewriting by the Brothers Grimm: A comparison of successive versions of three tales , 1989, Comput. Humanit..

[12]  P. Mousty,et al.  Brulex: une base de donne 'es lexicales informatise 'e pour le franc?ais e 'crit et parle , 1990 .

[13]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[14]  Christian Vandendorpe Quelques considérations sur le nom propre , 1993 .

[15]  Yves Bestgen,et al.  Terrorist rhetoric: Texture and architecture , 1994 .

[16]  Yves Bestgen Can emotional valence in stories be determined from words , 1994 .

[17]  Yorick Wilks,et al.  Information Extraction as a Core Language Technology , 1997, SCIE.

[18]  T. Landauer,et al.  A Solution to Plato's Problem: The Latent Semantic Analysis Theory of Acquisition, Induction, and Representation of Knowledge. , 1997 .

[19]  Yehuda Lindell,et al.  Text Mining at the Term Level , 1998, PKDD.

[20]  Peter W. Foltz,et al.  An introduction to latent semantic analysis , 1998 .

[21]  Curt Burgess,et al.  Explorations in context space: Words, sentences, discourse , 1998 .

[22]  Peter W. Foltz,et al.  The Measurement of Textual Coherence with Latent Semantic Analysis. , 1998 .

[23]  Martin Rajman,et al.  Text Mining: Natural Language techniques and Text Mining applications , 1998 .

[24]  M. Yik A circumplex model of affect and its relation to personality : a five-language study , 1999 .

[25]  A. Pplications Knowledge Discovery in Texts: a Definition, and Applications , 1999 .

[26]  W. Kintsch Metaphor comprehension: A computational theory , 2000, Psychonomic bulletin & review.

[27]  Sanjiv Ranjan Das Yahoo! for Amazon : Opinion Extraction from Small Talk on the Web , 2001 .

[28]  Yves Bestgen,et al.  L’analyse sémantique latente et l’identification des métaphores , 2002, JEPTALNRECITAL.