Using syntactic features to predict author personality from text

The style in which a text is written re ects an array of meta-information concerning the text (e.g., topic, register, genre) and its author (e.g., gender, region, age, personality). The eld of stylometry addresses these aspects of style. A successful methodology, borrowed from text categorisation research, takes a two-stage approach which (i) achieves automatic selection of features with high predictive value for the categories to be learned, and (ii) uses machine learning algorithms to learn to categorize new documents by using the selected features (Sebastiani, 2002). To allow the selection of linguistic features rather than (n-grams of) terms, robust and accurate text analysis tools are necessary. Recently, language technology has progressed to a state of the art in which the systematic study of the variation of these linguistic properties in texts by different authors, time periods, regiolects, genres, registers, or even genders has become feasible.

[1]  Shlomo Argamon,et al.  Authorship attribution with thousands of candidate authors , 2006, SIGIR.

[2]  P. Costa,et al.  Reinterpreting the Myers-Briggs Type Indicator from the perspective of the five-factor model of personality. , 1989, Journal of personality.

[3]  Alastair J. Gill,et al.  Taking Care of the Linguistic Features of Extraversion , 2019, Proceedings of the Twenty-Fourth Annual Conference of the Cognitive Science Society.

[4]  J. Pennebaker,et al.  The Secret Life of Pronouns , 2003, Psychological science.

[5]  Walter Daelemans,et al.  Memory-Based Language Processing , 2009, Studies in natural language processing.

[6]  Jon Oberlander,et al.  Identifying more bloggers: Towards large scale personality classification of personal weblogs , 2007, ICWSM.

[7]  Fabrizio Sebastiani,et al.  Machine learning in automated text categorization , 2001, CSUR.

[8]  Erik F. Tjong Kim Sang,et al.  Memory-Based Shallow Parsing , 2002, J. Mach. Learn. Res..

[9]  J. Pennebaker,et al.  LEXICAL PREDICTORS OFPERSONALITY TYPE , 2005 .

[10]  Alastair J. Gill Personality and language: the projection and perception of personality in computer-mediated communication , 2004 .

[11]  S. Wineburg Historical Problem Solving: A Study of the Cognitive Processes Used in the Evaluation of Documentary and Pictorial Evidence , 1991 .

[12]  Efstathios Stamatatos,et al.  Computer-Based Authorship Attribution Without Lexical Measures , 2001, Comput. Humanit..

[13]  Myers,et al.  Gifts Differing: Understanding Personality Type , 1980 .

[14]  Marilyn A. Walker,et al.  Using Linguistic Cues for the Automatic Recognition of Personality in Conversation and Text , 2007, J. Artif. Intell. Res..

[15]  Walter Daelemans,et al.  Memory-Based Language Processing (Studies in Natural Language Processing) , 2005 .