论文信息 - Language Independent Authorship Attribution using Character Level Language Models

Language Independent Authorship Attribution using Character Level Language Models

We present a method for computerassisted authorship attribution based on character-level -gram language models. Our approach is based on simple information theoretic principles, and achieves improved performance across a variety of languages without requiring extensive pre-processing or feature selection. To demonstrate the effectiveness and language independence of our approach, we present experimental results on Greek, English, and Chinese data. We show that our approach achieves state of the art performance in each of these cases. In particular, we obtain a 18% accuracy improvement over the best published results for a Greek data set, while using a far simpler technique than previous investigations.

[1] Sholom M. Weiss,et al. Towards language independent automated learning of text categorization models , 1994, SIGIR '94.

[2] W. B. Cavnar,et al. N-gram-based text categorization , 1994 .

[3] D. Holmes,et al. The Federalist Revisited: New Directions in Authorship Attribution , 1995 .

[4] Michal Ephratt. Authorship attribution - the case of lexical innovations , 1997 .

[5] Efstathios Stamatatos,et al. Automatic Authorship Attribution , 1999, EACL.

[6] Stan Matwin,et al. Feature Engineering for Text Classification , 1999, ICML.

[7] Ian H. Witten,et al. Text mining: a new frontier for lossless compression , 1999, Proceedings DCC'99 Data Compression Conference (Cat. No. PR00096).

[8] F ChenStanley,et al. An Empirical Study of Smoothing Techniques for Language Modeling , 1996, ACL.

[9] Efstathios Stamatatos,et al. Automatic Text Categorization In Terms Of Genre and Author , 2000, CL.

[10] Akiko Aizawa. Linguistic Techniques to Improve the Performance of Automatic Text Categorization , 2001, NLPRS.

[11] Efstathios Stamatatos,et al. Computer-Based Authorship Attribution Without Lexical Measures , 2001, Comput. Humanit..

[12] H. Love. Attributing Authorship: An Introduction , 2002 .