Extracting Social Power Relationships from Natural Language

Sociolinguists have long argued that social context influences language use in all manner of ways, resulting in lects. This paper explores a text classification problem we will call lect modeling, an example of what has been termed computational sociolinguistics. In particular, we use machine learning techniques to identify social power relationships between members of a social network, based purely on the content of their interpersonal communication. We rely on statistical methods, as opposed to language-specific engineering, to extract features which represent vocabulary and grammar usage indicative of social power lect. We then apply support vector machines to model the social power lects representing superior-subordinate communication in the Enron email corpus. Our results validate the treatment of lect modeling as a text classification problem -- albeit a hard one -- and constitute a case for future research in computational sociolinguistics.

[1]  Claire Cardie,et al.  Learning with Compositional Semantics as Structural Inference for Subsentential Sentiment Analysis , 2008, EMNLP.

[2]  Claire Cardie,et al.  Identifying Expressions of Opinion in Context , 2007, IJCAI.

[3]  Marilyn A. Walker,et al.  Automatic Recognition of Personality in Conversation , 2006, NAACL.

[4]  Lise Getoor,et al.  Inferring Organizational Titles in Online Communication , 2006, SNA@ICML.

[5]  J. Platt Sequential Minimal Optimization : A Fast Algorithm for Training Support Vector Machines , 1998 .

[6]  David J. Marchette,et al.  Scan Statistics on Enron Graphs , 2005, Comput. Math. Organ. Theory.

[7]  Carlo Strapparava,et al.  Learning to identify emotions in text , 2008, SAC '08.

[8]  Bo Pang,et al.  Thumbs up? Sentiment Classification using Machine Learning Techniques , 2002, EMNLP.

[9]  Mike Wells,et al.  Structured Models for Fine-to-Coarse Sentiment Analysis , 2007, ACL.

[10]  Bo Pang,et al.  Seeing Stars: Exploiting Class Relationships for Sentiment Categorization with Respect to Rating Scales , 2005, ACL.

[11]  Soo-Min Kim,et al.  Determining the Sentiment of Opinions , 2004, COLING.

[12]  Penelope Brown,et al.  Politeness: Some Universals in Language Usage , 1989 .

[13]  William M. O'Barr,et al.  Speech style and impression formation in a court setting: The effects of “powerful” and “powerless” speech , 1978 .

[14]  David A. Morand,et al.  Language and power: an empirical analysis of linguistic strategies used in superior–subordinate communication , 2000 .

[15]  Ian Witten,et al.  Data Mining , 2000 .

[16]  Xiaojin Zhu,et al.  --1 CONTENTS , 2006 .

[17]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[18]  Takashi Inui,et al.  Extracting Semantic Orientations of Words using Spin Model , 2005, ACL.

[19]  Claire Cardie,et al.  Adapting a Polarity Lexicon using Integer Linear Programming for Domain-Specific Sentiment Classification , 2009, EMNLP.

[20]  Efstathios Stamatatos,et al.  A survey of modern authorship attribution methods , 2009, J. Assoc. Inf. Sci. Technol..

[21]  Lise Getoor,et al.  Relationship Identification for Social Network Discovery , 2007, AAAI.

[22]  Delip Rao,et al.  Semi-Supervised Polarity Lexicon Induction , 2009, EACL.

[23]  F. Mosteller,et al.  Inference and Disputed Authorship: The Federalist , 1966 .

[24]  Andrew McCallum,et al.  Topic and Role Discovery in Social Networks with Experiments on Enron and Academic Email , 2007, J. Artif. Intell. Res..

[25]  Cecilia Ovesdotter Alm,et al.  Emotions from Text: Machine Learning for Text-based Emotion Prediction , 2005, HLT.

[26]  Jon Oberlander,et al.  Whose Thumb Is It Anyway? Classifying Author Personality from Weblog Text , 2006, ACL.