Lexical Analysis of US Political Speeches

Abstract This article describes a US political corpus comprising 245 speeches given by senators John McCain and Barack Obama during the years 2007–2008. We present the main characteristics of this collection and compare the common English words most frequently used by these political leaders with ordinary usage (Brown corpus). We then discuss and compare certain metrics capable of extracting terms best characterizing a given subset of the entire text corpus. Terms overused and underused by both candidates during the last US presidential election are determined and analysed from both a statistical and dynamic perspective.

[1]  V. Herman ‘WHAT GOVERNMENTS SAY AND WHAT GOVERNMENTS DO: AN ANALYSIS OF POST-WAR QUEEN'S SPEECHES’ , 1975 .

[2]  W. Nelson Francis,et al.  FREQUENCY ANALYSIS OF ENGLISH USAGE: LEXICON AND GRAMMAR , 1983 .

[3]  R. Burchfield Frequency Analysis of English Usage: Lexicon and Grammar. By W. Nelson Francis and Henry Kučera with the assistance of Andrew W. Mackie. Boston: Houghton Mifflin. 1982. x + 561 , 1985 .

[4]  Kenneth Ward Church,et al.  Word Association Norms, Mutual Information, and Lexicography , 1989, ACL.

[5]  C. Muller Principes et méthodes de statistique lexicale , 1992 .

[6]  Ted Dunning,et al.  Accurate Methods for the Statistics of Surprise and Coincidence , 1993, CL.

[7]  Beatrice Daille,et al.  Combined approach for terminology extraction: lexical statistics and linguistic filtering , 1995 .

[8]  Christopher D. Manning,et al.  Enriching the Knowledge Sources Used in a Maximum Entropy Part-of-Speech Tagger , 2000, EMNLP.

[9]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[10]  Geoffrey Sampson,et al.  Word frequency distributions , 2002, Computational Linguistics.

[11]  D. Labbé,et al.  Le discours gouvernemental. Canada, Québec, France (1945-2000) , 2003 .

[12]  Dan Klein,et al.  Feature-Rich Part-of-Speech Tagging with a Cyclic Dependency Network , 2003, NAACL.

[13]  Pierre Nugues An Introduction to Language Processing with Perl and Prolog: An Outline of Theories, Implementation, and Application with Special Consideration of English, French, and German , 2006, Cognitive Technologies.

[14]  Michael J. Crawley,et al.  The R book , 2022 .

[15]  Dominique Labbé,et al.  Experiments on authorship attribution by intertextual distance in english* , 2007, J. Quant. Linguistics.

[16]  Elisabeth Dévière,et al.  Analyzing linguistic data: a practical introduction to statistics using R , 2009 .

[17]  D. Labbé,et al.  Les mots qui nous gouvernent: le discours des premiers ministres québécois : 1960-2005 , 2011 .