The Computational Case against Computational Literary Studies

1 This essay works at the empirical level to isolate a series of technical problems, logical fallacies, and conceptual flaws in an increasingly popular subfield in literary studies variously known as cultural analytics, literary data mining, quantitative formalism, literary text mining, computational textual analysis, computational criticism, algorithmic literary studies, social computing for literary studies, and computational literary studies (the phrase I use here). In a nutshell the problem with computational literary analysis as it stands is that what is robust is obvious (in the empirical sense) and what is not obvious is not robust, a situation not easily overcome given the nature of literary data and the nature of statistical inquiry. There is a fundamental mismatch between the statistical tools that are used and the objects to which they are applied. Digital humanities (DH), a field of study which can encompass subjects as diverse as histories of media and early computational practices, the digitization of texts for open access, digital inscription and mediation, and computational linguistics and lexicology, and technical papers on data mining, is not the object of my critique. Rather, I am addressing specifically the project of running computer programs on large (or usually not so large) corpora of literary texts to yield quantitative results which are then mapped, graphed, and tested for statistical significance and used