The Effect of Different Writing Tasks on Linguistic Style: A Case Study of the ROC Story Cloze Task

A writer's style depends not just on personal traits but also on her intent and mental state. In this paper, we show how variants of the same writing task can lead to measurable differences in writing style. We present a case study based on the story cloze task (Mostafazadeh et al., 2016a), where annotators were assigned similar writing tasks with different constraints: (1) writing an entire story, (2) adding a story ending for a given story context, and (3) adding an incoherent ending to a story. We show that a simple linear classifier informed by stylistic features is able to successfully distinguish among the three cases, without even looking at the story context. In addition, combining our stylistic features with language model predictions reaches state of the art performance on the story cloze challenge. Our results demonstrate that different task framings can dramatically affect the way people write.

[1]  F. Mosteller,et al.  Inference in an Authorship Problem , 1963 .

[2]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[3]  J. Pennebaker,et al.  Linguistic styles: language use as an individual difference. , 1999, Journal of personality and social psychology.

[4]  J. Smyth,et al.  The writing cure: How expressive writing promotes health and emotional well-being. , 2002 .

[5]  J. Pennebaker,et al.  Lying Words: Predicting Deception from Linguistic Styles , 2003, Personality & social psychology bulletin.

[6]  J. Pennebaker,et al.  PERSONALITY PROCESSES AND INDIVIDUAL DIFFERENCES Words of Wisdom: Language Use Over the Life Span , 2003 .

[7]  J. Pennebaker,et al.  The Secret Life of Pronouns , 2003, Psychological science.

[8]  Anat Rachel Shimoni,et al.  Gender, genre, and writing style in formal written texts , 2003 .

[9]  Jay F. Nunamaker,et al.  An exploratory study on promising cues in deception detection and application of decision tree , 2004, 37th Annual Hawaii International Conference on System Sciences, 2004. Proceedings of the.

[10]  Moshe Koppel,et al.  Determining an author's native language by mining a text for errors , 2005, KDD '05.

[11]  J. Frattaroli Experimental disclosure and its moderators: a meta-analysis. , 2006, Psychological bulletin.

[12]  Shlomo Argamon,et al.  Effects of Age and Gender on Blogging , 2006, AAAI Spring Symposium: Computational Approaches to Analyzing Weblogs.

[13]  Jeffrey T. Hancock,et al.  On Lying and Being Lied To: A Linguistic Analysis of Deception in Computer-Mediated Communication , 2007 .

[14]  Ari Rappoport,et al.  Using Classifier Features for Studying the Effect of Native Language on the Choice of Written Second Language Words , 2007 .

[15]  Shlomo Argamon,et al.  Computational methods in authorship attribution , 2009, J. Assoc. Inf. Sci. Technol..

[16]  Y. Tausczik Linguistic analysis of workplace computer-mediated communication , 2009 .

[17]  Efstathios Stamatatos,et al.  A survey of modern authorship attribution methods , 2009, J. Assoc. Inf. Sci. Technol..

[18]  Lukás Burget,et al.  Recurrent neural network based language model , 2010, INTERSPEECH.

[19]  Ari Rappoport,et al.  ICWSM - A Great Catchy Name: Semi-Supervised Recognition of Sarcastic Sentences in Online Product Reviews , 2010, ICWSM.

[20]  Ari Rappoport,et al.  Enhanced Sentiment Learning Using Twitter Hashtags and Smileys , 2010, COLING.

[21]  Jeffrey T. Hancock,et al.  Language Style Matching as a Predictor of Social Dynamics in Small Groups , 2010, Commun. Res..

[22]  Lauren E. Scissors,et al.  Language Style Matching Predicts Relationship Initiation and Stability , 2011, Psychological science.

[23]  Carolyn Penstein Rosé,et al.  Author Age Prediction from Text using Linear Regression , 2011, LaTeCH@ACL.

[24]  Hector J. Levesque,et al.  The Winograd Schema Challenge , 2011, AAAI Spring Symposium: Logical Formalizations of Commonsense Reasoning.

[25]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[26]  Claire Cardie,et al.  Finding Deceptive Opinion Spam by Any Stretch of the Imagination , 2011, ACL.

[27]  Sara Rosenthal,et al.  Age Prediction in Blogs: A Study of Style, Content, and Online Behavior in Pre- and Post-Social Media Generations , 2011, ACL.

[28]  Noah A. Smith,et al.  Textual Predictors of Bill Survival in Congressional Committees , 2012, NAACL.

[29]  David Bamman,et al.  Gender identity and lexical variation in social media , 2012, 1210.4567.

[30]  David Yarowsky,et al.  Stylometric Analysis of Scientific Articles , 2012, NAACL.

[31]  Yejin Choi,et al.  Syntactic Stylometry for Deception Detection , 2012, ACL.

[32]  Zornitsa Kozareva,et al.  SemEval-2012 Task 7: Choice of Plausible Alternatives: An Evaluation of Commonsense Causal Reasoning , 2011, *SEMEVAL.

[33]  Lada A. Adamic,et al.  Families on Facebook , 2013, ICWSM.

[34]  Margaret L. Kern,et al.  Personality, Gender, and Age in the Language of Social Media: The Open-Vocabulary Approach , 2013, PloS one.

[35]  Dong Nguyen,et al.  "How Old Do You Think I Am?" A Study of Language and Age in Twitter , 2013, ICWSM.

[36]  Frank Rudzicz,et al.  Automatic detection of deception in child-produced speech using syntactic complexity features , 2013, ACL.

[37]  Taylor Jackson Scott,et al.  Statistical affect detection in collaborative chat , 2013, CSCW.

[38]  Yejin Choi,et al.  Keystroke Patterns as Prosody in Digital Writings: A Case Study with Deceptive Reviews and Essays , 2014, EMNLP.

[39]  Verónica Pérez-Rosas,et al.  Gender Differences in Deceivers Writing Style , 2014, MICAI.

[40]  Marlone D. Henderson,et al.  Language Style Matching, Engagement, and Impasse in Negotiations , 2014 .

[41]  Verónica Pérez-Rosas,et al.  Cross-cultural Deception Detection , 2014, ACL.

[42]  Matthias R. Mehl,et al.  Natural language use as a marker of personality. , 2014 .

[43]  Yuandong Tian,et al.  Simple Baseline for Visual Question Answering , 2015, ArXiv.

[44]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[45]  Phil Blunsom,et al.  Teaching Machines to Read and Comprehend , 2015, NIPS.

[46]  Christopher Potts,et al.  A large annotated corpus for learning natural language inference , 2015, EMNLP.

[47]  Jian Zhang,et al.  SQuAD: 100,000+ Questions for Machine Comprehension of Text , 2016, EMNLP.

[48]  Allan Jabri,et al.  Revisiting Visual Question Answering Baselines , 2016, ECCV.

[49]  Sandro Pezzelle,et al.  The LAMBADA dataset: Word prediction requiring a broad discourse context , 2016, ACL.

[50]  Aline Villavicencio,et al.  Enhancing the LexVec Distributed Word Representation Model Using Positional Contexts and External Memory , 2016, ArXiv.

[51]  Nathanael Chambers,et al.  A Corpus and Cloze Evaluation for Deeper Understanding of Commonsense Stories , 2016, NAACL.

[52]  Jason Weston,et al.  Towards AI-Complete Question Answering: A Set of Prerequisite Toy Tasks , 2015, ICLR.

[53]  Danqi Chen,et al.  A Thorough Examination of the CNN/Daily Mail Reading Comprehension Task , 2016, ACL.

[54]  Pushmeet Kohli,et al.  Story Cloze Evaluator: Vector Space Representation Evaluation by Predicting What Happens Next , 2016, RepEval@ACL.

[55]  Yejin Choi,et al.  Story Cloze Task: UW NLP System , 2017, LSDSem@EACL.

[56]  Catherine Havasi,et al.  ConceptNet 5.5: An Open Multilingual Graph of General Knowledge , 2016, AAAI.

[57]  Lifu Tu,et al.  Pay Attention to the Ending:Strong Neural Baselines for the ROC Story Cloze Task , 2017, ACL.

[58]  Nathanael Chambers,et al.  LSDSem 2017 Shared Task: The Story Cloze Test , 2017, LSDSem@EACL.

[59]  N. Collins,et al.  Language Style Matching in Romantic Partners’ Conflict and Support Interactions , 2017 .

[60]  Yevgeniy Puzikov,et al.  LSDSem 2017: Exploring Data Generation Methods for the Story Cloze Test , 2017, LSDSem@EACL.