Email Formality in the Workplace: A Case Study on the Enron Corpus

Email is an important way of communication in our daily life and it has become the subject of various NLP and social studies. In this paper, we focus on email formality and explore the factors that could affect the sender's choice of formality. As a case study, we use the Enron email corpus to test how formality is affected by social distance, relative power, and the weight of imposition, as defined in Brown and Levinson's model of politeness (1987). Our experiments show that their model largely holds in the Enron corpus. We believe that the methodology proposed in the paper can be applied to other social media domains and be used to test other linguistic or social theories.

[1]  Jafar Adibi,et al.  The Enron Email Dataset Database Schema and Brief Statistical Report , 2004 .

[2]  Ewan Klein,et al.  Natural Language Processing with Python , 2009 .

[3]  William W. Cohen,et al.  Extracting Personal Names from Email: Applying Named Entity Recognition to Informal Text , 2005, HLT.

[4]  Andrew McCallum,et al.  Extracting social networks and contact information from email and the Web , 2004, CEAS.

[5]  J. Nunamaker,et al.  Automating Linguistics-Based Cues for Detecting Deception in Text-Based Asynchronous Computer-Mediated Communications , 2004 .

[6]  Yiming Yang,et al.  The Enron Corpus: A New Dataset for Email Classi(cid:12)cation Research , 2004 .

[7]  William W. Cohen,et al.  Improving “Email Speech Acts” Analysis via N-gram Selection , 2006, HLT-NAACL 2006.

[8]  Andrew McCallum,et al.  Topic and Role Discovery in Social Networks with Experiments on Enron and Academic Email , 2007, J. Artif. Intell. Res..

[9]  W. Orlikowski,et al.  Genre Repertoire: The Structuring of Communicative Practices in Organizations , 1994 .

[10]  Penelope Brown,et al.  Politeness: Some Universals in Language Usage , 1989 .

[11]  William W. Cohen,et al.  Discovering Leadership Roles in Email Workgroups , 2007, CEAS.

[12]  Richard L. Daft,et al.  Organizational information requirements, media richness and structural design , 1986 .

[13]  J. Searle Expression and Meaning: A taxonomy of illocutionary acts , 1975 .

[14]  P. Keila,et al.  Detecting Unusual and Deceptive Communication in Email , 2005 .

[15]  Terrill L. Frantz,et al.  Communication Networks from the Enron Email Corpus “It's Always About the People. Enron is no Different” , 2005, Comput. Math. Organ. Theory.

[16]  Anton Leuski Email is a stage: discovering people roles from email archives , 2004, SIGIR '04.

[17]  Tom M. Mitchell,et al.  Learning to Classify Email into “Speech Acts” , 2004, EMNLP.

[18]  Louise Guthrie,et al.  Towards the Orwellian Nightmare: Separation of Business and Personal Emails , 2006, ACL.