Extracting speaker-specific functional expressions from political speeches using random forests in order to investigate speakers' political styles

In this study we extracted speaker-specific functional expressions from political speeches using random forests to investigate speakers' political styles. Along with methodological development, stylistics has expanded its scope into new areas of application such as authorship profiling and sentiment analysis in addition to conventional areas such as authorship attribution and genre-based text classification. Among these, computational sociolinguistics, which aims at providing a systematic and solid basis for sociolinguistic analysis using machine learning and linguistically-motivated features, is a potentially important area. In this study we showed the effectiveness of the random forests classifier for such tasks by applying it to Japanese prime ministers' Diet speeches. The results demonstrated that our method successfully extracted the speaker-specific expressions of two Japanese prime ministers, and enabled us to investigate their political styles in a systematic manner. The method can be applied to sociolinguistic analysis of various other types of texts, and in this way, this study will contribute to developing the area of computational sociolinguistics. © 2009 Wiley Periodicals, Inc.

[1]  Bo Pang,et al.  A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts , 2004, ACL.

[2]  Kyo Kageura,et al.  Stylistic Analysis of Japanese Prime Ministers' Diet Addresses , 2008, LKR.

[3]  Philippe J. Maarek,et al.  Political communication in a new era : a cross-national perspective , 2003 .

[4]  Shlomo Argamon,et al.  Gender, Race, and Nationality in Black Drama, 1950-2006: Mining Differences in Language Use in Authors and their Characters , 2009, Digit. Humanit. Q..

[5]  Shlomo Argamon,et al.  Stylistic text classification using functional lexical features , 2007, J. Assoc. Inf. Sci. Technol..

[6]  Jenefer Robinson A Sentimental Education , 2005 .

[7]  Kyo Kageura,et al.  Exploring the Microscopic Textual Characteristics of Japanese Prime Ministers’ Diet Addresses by Measuring the Quantity and Diversity of Nouns , 2007, PACLIC.

[8]  Ramón Díaz-Uriarte,et al.  Gene selection and classification of microarray data using random forest , 2006, BMC Bioinformatics.

[9]  Isabel Iñigo-Mora,et al.  On the use of the personal pronoun we in communities , 2004 .

[10]  Antonio Miranda García,et al.  Function Words in Authorship Attribution Studies , 2007, Lit. Linguistic Comput..

[11]  R. Wodak,et al.  Methods of critical discourse analysis , 2001 .

[12]  Bei Yu,et al.  An evaluation of text classification methods for literary study , 2008, Lit. Linguistic Comput..

[13]  Fabrizio Sebastiani,et al.  Machine learning in automated text categorization , 2001, CSUR.

[14]  Debbie Orpin Corpus Linguistics and Critical Discourse Analysis: Examining the ideology of sleaze , 2005 .

[15]  I. Kabashima,et al.  How Junichiro Koizumi seized the leadership of Japan's Liberal Democratic Party , 2007, Japanese Journal of Political Science.

[16]  J. Pennebaker,et al.  The Secret Life of Pronouns , 2003, Psychological science.

[17]  Jack Grieve,et al.  Quantitative Authorship Attribution: An Evaluation of Techniques , 2007, Lit. Linguistic Comput..

[18]  John F. Burrows,et al.  ‘An ocean where each kind. . .’: Statistical analysis and some major determinants of literary style , 1989, Comput. Humanit..

[19]  Kyo Kageura,et al.  A method for the comparative analysis of concentration of author productivity, giving consideration to the effect of sample size dependency of statistical measures , 2003, J. Assoc. Inf. Sci. Technol..

[20]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[21]  Jean Aitchison,et al.  New media language , 2004 .