论文信息 - Extracting Semantics from the Enron Corpus

Extracting Semantics from the Enron Corpus

Indirect measures must be used when analysing attitudes, as individuals are unlikely to voluntarily express beliefs that are opposed by social norms. The IAT indirectly assesses attitudes through the automatic association of concepts and attributes, however it requires strict control of extraneous influences. This paper proposes an alternative indirect measure of attitudes by designing a semantic space of the way in which words are used in language. To demonstrate the use of semantic spaces, the Enron corpus is analysed to discover whether any cultural attitudes can be observed. In the preprocessing stage, the corpus is tokenised, lemmatised and irrelevant information to semantic analysis is removed. The Enron Semantic Space is then created from the corpus, incorporating multiple features from Hyperspace Analogue to Language (HAL), Latent Semantic Analysis (LSA) and Lowe and McDonald’s Semantic Space (LMS). A free association test is then introduced to analyse the accuracy that the system can observe direct cognitive priming. Features from LMS and LSA are selected over HAL in the optimum implementation as they give the best accuracy of 86.86% on the free association test. The same features are also shown to be able to observe graded and mediated priming. After, an application is presented that allows a user to create an Enron Semantic Space from scratch, and compare the differences in similarities of concepts and attributes found in the space. Using this application a numerous amount of attitude experiments are conducted. Life words are found to be associated to pleasant words and death words associated to unpleasant words. Enron is also found to be more similar to pleasant words than Dynergy. Competence words are found to be associated with youth words and incompetence words associated with elderly words. Furthermore, career words are found to be associated with male words and family words with female words. Finally, we conclude that the results support the argument towards using a semantic space to analyse attitudes, however supplementary studies need to be conducted to replicate exact experiments conducted by the IAT.

T. Macfarlane