Building a Corpus for Personality-dependent Natural Language Understanding and Generation

The computational treatment of human personality both for the recognition of personality traits from text and for the generation of text so as to reflect a particular set of traits is central to the development of NLP applications. As a means to provide a basic resource for studies of this kind, this article describes the b5 corpus, a collection of controlled and free (non-topic specific) texts produced in different (e.g., referential or descriptive) communicative tasks, and accompanied by inventories of personality of their authors and additional demographics. The present discussion is mainly focused on the various corpus components and on the data collection task itself, but preliminary results of personality recognition from text are presented in order to illustrate how the corpus data may be reused. The b5 corpus aims to provide support for a wide range of NLP studies based on personality information and it is, to the best of our knowledge, the largest resource of this kind to be made available for research purposes in the Brazilian Portuguese language.

[1]  Sandra M. Aluísio,et al.  A Lightweight Regression Method to Infer Psycholinguistic Properties for Brazilian Portuguese , 2017, TSD.

[2]  Ivandré Paraboni,et al.  Stars2: a corpus of object descriptions in a visual domain , 2017, Lang. Resour. Evaluation.

[3]  Marie-Francine Moens,et al.  Recognising Personality Traits Using Facebook Status Updates , 2013, Proceedings of the International AAAI Conference on Web and Social Media.

[4]  Ivandré Paraboni,et al.  Big Five Personality Recognition from Multiple Text Genres , 2017, TSD.

[5]  Ivandré Paraboni,et al.  Definite Description Lexical Choice: taking Speaker's Personality into account , 2018, LREC.

[6]  Marilyn A. Walker,et al.  Using Linguistic Cues for the Automatic Recognition of Personality in Conversation and Text , 2007, J. Artif. Intell. Res..

[7]  O. John,et al.  Ten facet scales for the Big Five Inventory: Convergence with NEO PI-R facets, self-peer agreement, and discriminant validity , 2009 .

[8]  M. Tarr,et al.  Becoming a “Greeble” Expert: Exploring Mechanisms for Face Recognition , 1997, Vision Research.

[9]  K. Scherer,et al.  The Geneva affective picture database (GAPED): a new 730-picture database focusing on valence and normative significance , 2011, Behavior research methods.

[10]  A. Tellegen,et al.  PERSONALITY PROCESSES AND INDIVIDUAL DIFFERENCES An Alternative "Description of Personality": The Big-Five Factor Structure , 2022 .

[11]  Emiel Krahmer,et al.  Computational Generation of Referring Expressions: A Survey , 2012, CL.

[12]  Robert Dale,et al.  Speaker-Dependent Variation in Content Selection for Referring Expression Generation , 2010, ALTA.

[13]  Robert Dale,et al.  Referring Expression Generation through Attribute-Based Heuristics , 2009, ENLG.

[14]  Marcelo Caetano Martins Muniz A construção de recursos lingüístico-computacionais para o português do Brasil: o projeto Unitex-PB , 2004 .

[15]  Scott Nowson,et al.  Look! Who's Talking?: Projection of Extraversion Across Different Social Contexts , 2014, WCPR '14.

[16]  Josemberg Moura de Andrade Evidências de validade do inventário dos cinco grandes fatores de personalidade para o Brasil , 2008 .

[17]  Daniel Dichiu,et al.  Automatic Profiling of Twitter Users Based on Their Tweets: Notebook for PAN at CLEF 2015 , 2015, CLEF.

[18]  Rao Muhammad Adeel Nawab,et al.  Author's Traits Prediction on Twitter Data using Content Based Approach , 2015, CLEF.

[19]  Azucena Montes Rendón,et al.  Tweets Classification using Corpus Dependent Tags, Character and POS N-grams , 2015, CLEF.

[20]  Michael Wilson MRC Psycholinguistic Database , 2001 .

[21]  Marie-Francine Moens,et al.  Age and Gender Identification in Social Media , 2014, CLEF.

[22]  Thiago Castro Ferreira,et al.  Generating natural language descriptions using speaker-dependent information , 2017, Nat. Lang. Eng..

[23]  Fabio Celli Adaptive Personality Recognition from Text , 2013 .

[24]  Ielka van der Sluis,et al.  Evaluating algorithms for the Generation of Referring Expressions using a balanced corpus , 2007, ENLG.

[25]  Ivandré Paraboni,et al.  Personality-Dependent Referring Expression Generation , 2017, TSD.

[26]  Francisco Iacobelli,et al.  Large Scale Personality Classification of Bloggers , 2011, ACII.

[27]  Marilyn A. Walker,et al.  Controlling User Perceptions of Linguistic Style: Trainable Generation of Personality Traits , 2011, CL.

[28]  Sandra M. Aluísio,et al.  An Evaluation of the Brazilian Portuguese LIWC Dictionary for Sentiment Analysis , 2013, STIL.

[29]  Michael J. Tarr,et al.  Recognizing disguised faces , 2012 .

[30]  Ivandré Paraboni,et al.  Effects of Cognitive Effort on the Resolution of Overspecified Descriptions , 2017, Computational Linguistics.

[31]  Ivandré Paraboni,et al.  Author Profiling from Facebook Corpora , 2018, LREC.

[32]  Hugo Jair Escalante,et al.  INAOE's Participation at PAN'15: Author Profiling task , 2015, CLEF.

[33]  Margaret L. Kern,et al.  Personality, Gender, and Age in the Language of Social Media: The Open-Vocabulary Approach , 2013, PloS one.