A Trinity of Trials: Surrey's 2014 Attempts at Author Verification

Encouraged by results from our approaches in previous PAN workshops, this paper explores three different approaches using stopword cooccurrence. High frequency patterns of co-occurrence can be used to some extent as identifiers of an author’s style, and have been demonstrated to operate similarly across certain languages - without requiring deeper linguistic knowledge. However, making best use of such information remains unresolved. We compare results from applying three approaches overs such patterns: a frequency-mean-variance framework; a positional-frequency cosine comparison approach, and a cosine distance-based approach. A clearly advantageous approach across all languages and genres is yet to emerge.