Analyzing Writing Styles with Coh-Metrix

Computer scientists, linguists, stylometricians, and cognitive scientists have successfully divided corpora into modes, domains, genres, registers, and authors. The limitations for these successes, however, often result from insufficient indices with which their corpora are analyzed. In this paper, we use Coh-Metrix, a computational tool that analyzes text on over 200 indices of cohesion and difficulty. We demonstrate how, with the benefit of statistical analysis, texts can be analyzed for subtle, yet meaningful differences. In this paper, we report evidence that authors within the same register can be computationally distinguished despite evidence that stylistic markers can also shift significantly over time.

[1]  Walter Kintsch,et al.  Toward a model of text comprehension and production. , 1978 .

[2]  Eric Brill,et al.  Transformation-Based Error-Driven Learning and Natural Language Processing: A Case Study in Part-of-Speech Tagging , 1995, CL.

[3]  D. McNamara Reading both high-coherence and low-coherence texts: effects of text sequence and prior knowledge. , 2001, Canadian journal of experimental psychology = Revue canadienne de psychologie experimentale.

[4]  E R Brown,et al.  A theory of reading. , 1981, Journal of communication disorders.

[5]  W. Shakespeare,et al.  Shakespeare, Fletcher and "The Two Noble Kinsmen" , 1990 .

[6]  Douglas Biber,et al.  Variation across speech and writing: Methodology , 1988 .

[7]  J. A. Smith,et al.  Stylistic Constancy and Change Across Literary Corpora: Using Measures of Lexical Richness to Date Works , 2002, Comput. Humanit..

[8]  Eugene Charniak,et al.  A Maximum-Entropy-Inspired Parser , 2000, ANLP.

[9]  Frederick Mosteller,et al.  Applied Bayesian and classical inference : the case of the Federalist papers , 1984 .

[10]  Eugene Charniak,et al.  Statistical Techniques for Natural Language Parsing , 1997, AI Mag..

[11]  Susan T. Dumais,et al.  The latent semantic analysis theory of knowledge , 1997 .

[12]  D. Holmes,et al.  The Federalist Revisited: New Directions in Authorship Attribution , 1995 .

[13]  Douglas Biber,et al.  A Textual Comparison of British and American Writing , 1987 .

[14]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[15]  Jussi Karlgren,et al.  Recognizing Text Genres With Simple Metrics Using Discriminant Analysis , 1994, COLING.

[16]  Max M. Louwerse,et al.  Semantic Variation in Idiolect and Sociolect: Corpus Linguistic Evidence from Literary Texts , 2004, Comput. Humanit..

[17]  Arthur C. Graesser,et al.  Variation in Language and Cohesion across Written and Spoken Registers , 2004 .

[18]  Michael Halliday,et al.  Cohesion in English , 1976 .

[19]  Arthur C. Graesser,et al.  Component processes in text comprehension and some of their interactions , 1985 .

[20]  A. Q. Morton The Authorship of Greek Prose , 1965 .

[21]  Joseph Rudman,et al.  The State of Authorship Attribution Studies: Some Problems and Solutions , 1997, Comput. Humanit..

[22]  Arthur C. Graesser,et al.  Coh-Metrix: Analysis of text on cohesion and language , 2004, Behavior research methods, instruments, & computers : a journal of the Psychonomic Society, Inc.

[23]  M A Just,et al.  A theory of reading: from eye fixations to comprehension. , 1980, Psychological review.

[24]  M. Jackson,et al.  Shakespeare, Fletcher, and The Two Noble Kinsmen. , 1990 .

[25]  James H. Martin,et al.  Speech and language processing: an introduction to natural language processing, computational linguistics, and speech recognition, 2nd Edition , 2000, Prentice Hall series in artificial intelligence.

[26]  Ashwin Ram,et al.  A Theory of Reading , 1994, AAAI.

[27]  Douglas Biber,et al.  Representativeness in corpus design , 1993 .

[28]  Efstathios Stamatatos,et al.  Automatic Text Categorization In Terms Of Genre and Author , 2000, CL.

[29]  John Burrows,et al.  Word-Patterns and Story-Shapes: The Statistical Analysis of Narrative Style , 1987 .

[30]  Hinrich Schütze,et al.  Automatic Detection of Text Genre , 1997, ACL.

[31]  W. Fucks ON MATHEMATICAL ANALYSIS OF STYLE , 1952 .

[32]  D. L. Mealand Correspondence Analysis of Luke , 1995 .

[33]  M. Louwerse An analytic and cognitve parameterization of coherence relations , 2002 .

[34]  Claude S. Brinegar,et al.  Mark Twain and the Quintus Curtius Snodgrass Letters: A Statistical Test of Authorship , 1963 .