Stylometric Analysis of Parliamentary Speeches: Gender Dimension

Relation between gender and language has been studied by many authors, however, there is still some uncertainty left regarding gender influence on language usage in the professional environment. Often, the studied data sets are too small or texts of individual authors are too short in order to capture differences of language usage wrt gender successfully. This study draws from a larger corpus of speeches transcripts of the Lithuanian Parliament (1990–2013) to explore language differences of political debates by gender via stylometric analysis. Experimental set up consists of stylistic features that indicate lexical style and do not require external linguistic tools, namely the most frequent words, in combination with unsupervised machine learning algorithms. Results show that gender differences in the language use remain in professional environment not only in usage of function words, preferred linguistic constructions, but in the presented topics as well.

[1]  Carla J. Groom,et al.  Gender Differences in Language Use: An Analysis of 14,000 Text Samples , 2008 .

[2]  Walter Daelemans,et al.  Stylogenetics: clustering-based stylistic analysis of literary corpora , 2006 .

[3]  Jurgita Kapočiūtė-Dzikienė,et al.  Seimo posėdžių stenogramų tekstynas autorystės nustatymo bei autoriaus profilio sudarymo tyrimams , 2016 .

[4]  Shlomo Argamon,et al.  Automatically Categorizing Written Texts by Author Gender , 2002, Lit. Linguistic Comput..

[5]  David L. Hoover,et al.  Delta Prime? , 2004, Lit. Linguistic Comput..

[6]  S. Herring,et al.  Assessing Gender Authenticity in Computer-Mediated Language Use , 2004 .

[7]  Christine Wilson,et al.  A Widow and her Soldier: Stylometry and the American Civil War , 2001, Lit. Linguistic Comput..

[8]  Shlomo Argamon,et al.  Stylistic text classification using functional lexical features , 2007, J. Assoc. Inf. Sci. Technol..

[9]  Maciej Eder Computational stylistics and Biblical translation : how reliable can a dendrogram be ? , 2012 .

[10]  Janet Holmes,et al.  Sharing a laugh: Pragmatic aspects of humor and gender in the workplace , 2006 .

[11]  Jurgita Kapociute-Dzikiene,et al.  Automatic Author Profiling of Lithuanian Parliamentary Speeches: Exploring the Influence of Features and Dataset Sizes , 2014, Baltic HLT.

[12]  Maciej Eder,et al.  Mind your corpus: systematic errors in authorship attribution , 2013, Lit. Linguistic Comput..

[13]  Louise Mullany,et al.  Gendered discourse in the professional workplace , 2007 .

[14]  Maciej Eder,et al.  Do birds of a feather really flock together, or how to choose training samples for authorship attribution , 2013, Lit. Linguistic Comput..

[15]  Hans van Halteren,et al.  New Machine Learning Methods Demonstrate the Existence of a Human Stylome , 2005, J. Quant. Linguistics.

[16]  Shlomo Argamon,et al.  Interpreting Burrows's Delta: Geometric and Probabilistic Foundations , 2007, Lit. Linguistic Comput..

[17]  Mats Dahllöf Automatic prediction of gender, political affiliation, and age in Swedish politicians from the wording of their speeches - A comparative study of classifiability , 2012, Lit. Linguistic Comput..

[18]  Walter Daelemans,et al.  Personae: a Corpus for Author and Personality Prediction from Text , 2008, LREC.

[19]  Rajarathnam Chandramouli,et al.  Author gender identification from text , 2011, Digit. Investig..

[20]  Sarah Steiner Gender, Genre, and Writing Style in Formal Written Texts , 2014 .

[21]  R. Lakoff,et al.  Language and woman's place , 1973, Language in Society.

[22]  Efstathios Stamatatos,et al.  A survey of modern authorship attribution methods , 2009, J. Assoc. Inf. Sci. Technol..

[23]  D. Hoover Corpus Stylistics, Stylometry, and the Styles of Henry James , 2007 .

[24]  J. Holmes Women, Men and Politeness , 1995 .

[25]  Maciej Eder,et al.  Does size matter? Authorship attribution, small samples, big problem , 2015, Digit. Scholarsh. Humanit..

[26]  Jacques Mehler,et al.  Word frequency as a cue for identifying function words in infancy , 2010, Cognition.

[27]  J. Pennebaker,et al.  The Secret Life of Pronouns , 2003, Psychological science.

[28]  J. F. Burrows,et al.  Not Unles You Ask Nicely: The Interpretative Nexus Between Analysis and Information , 1992 .

[29]  Bei Yu,et al.  Language and gender in Congressional speech , 2014, Lit. Linguistic Comput..

[30]  John C. Paolillo,et al.  Gender and genre variation in weblogs , 2006 .

[31]  L. Infante,et al.  Hierarchical Clustering , 2020, International Encyclopedia of Statistical Science.

[32]  Patrick Juola,et al.  A Controlled-corpus Experiment in Authorship Identification by Cross-entropy , 2003 .

[33]  John Burrows,et al.  'Delta': a Measure of Stylistic Difference and a Guide to Likely Authorship , 2002, Lit. Linguistic Comput..

[34]  J. Weijer,et al.  Word length, sentence length and frequency: Zipf revisited , 2004 .

[35]  Walter Daelemans,et al.  Explanation in Computational Stylometry , 2013, CICLing.

[36]  J. Pennebaker,et al.  The Secret Life of Pronouns , 2003, Psychological science.

[37]  Maciej Eder,et al.  Deeper Delta across genres and languages: do we really need the most frequent words? , 2011, Lit. Linguistic Comput..