A Trinity of Trials: Surrey's 2014 Attempts at Author Verification
暂无分享,去创建一个
Encouraged by results from our approaches in previous PAN
workshops, this paper explores three different approaches using stopword cooccurrence.
High frequency patterns of co-occurrence can be used to some
extent as identifiers of an author’s style, and have been demonstrated to operate
similarly across certain languages - without requiring deeper linguistic
knowledge. However, making best use of such information remains unresolved.
We compare results from applying three approaches overs such patterns: a
frequency-mean-variance framework; a positional-frequency cosine comparison
approach, and a cosine distance-based approach. A clearly advantageous
approach across all languages and genres is yet to emerge.
[1] Efstathios Stamatatos,et al. Overview of the Author Identification Task at PAN 2013 , 2013, CLEF.
[2] Patrick Juola,et al. An Overview of the Traditional Authorship Attribution Subtask , 2012, CLEF.
[3] Lee Gillam,et al. Quite Simple Approaches for Authorship Attribution, Intrinsic Plagiarism Detection and Sexual Predator Identification , 2012, CLEF.
[4] Lee Gillam,et al. A Textual Modus Operandi: Surrey's Simple System for Author Identification Notebook for PAN at CLEF 2013 , 2013, CLEF.