Improving Authorship Attribution: Optimizing Burrows' Delta Method*

Abstract Burrows' Delta Method (Burrows, 2002) is a leading method of authorship attribution. It can be used to shortlist potential authors from a list or to even identify potential authors. The technique has been extended by Hoover (2004a, 2006). In this investigation, we look at the choice of words for the word vector used, the size of the word vector, the similarity measure and the impact of corpus choice on the accuracy of text classification. Our results show a word frequency vector of between 200 and 300 words give the most accurate results (Aldridge, 2007). We also demonstrate a dramatic improvement in accuracy by adapting Burrows' Delta to the cosine similarity measure. Additionally, our results indicate areas where the word vector can be optimized still further for more accurate results.