A Topic Drift Model for authorship attribution

Authorship attribution is an active research direction due to its legal and financial importance. Its goal is to identify the authorship from the anonymous texts. In this paper, we propose a Topic Drift Model (TDM), which can monitor the dynamicity of authors writing styles and learn authors interests simultaneously. Unlike previous authorship attribution approaches, our model is sensitive to the temporal information and the ordering of words. Thus it can extract more information from texts. The experimental results show that our model achieves better results than other models in terms of accuracy. We also demonstrate the potential of our model to address the authorship verification problem.

[1]  Quoc V. Le,et al.  Distributed Representations of Sentences and Documents , 2014, ICML.

[2]  Min Yang,et al.  Ordering-Sensitive and Semantic-Aware Topic Modeling , 2015, AAAI.

[3]  Yunming Ye,et al.  Multidimensional Latent Semantic Analysis Using Term Spatial Information , 2013, IEEE Transactions on Cybernetics.

[4]  Efstathios Stamatatos,et al.  Automatic Text Categorization In Terms Of Genre and Author , 2000, CL.

[5]  Shlomo Argamon,et al.  Computational methods in authorship attribution , 2009 .

[6]  Yunming Ye,et al.  A Triple Wing Harmonium Model for Movie Recommendation , 2016, IEEE Transactions on Industrial Informatics.

[7]  Ingrid Zukerman,et al.  Collaborative Inference of Sentiments from Texts , 2010, UMAP.

[8]  Claudia Hauff,et al.  Large-scale author verification: temporal and topical influences , 2014, SIGIR.

[9]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[10]  Graeme Hirst,et al.  Vocabulary Changes in Agatha Christie's Mysteries as an Indication of Dementia: A Case Study , 2009 .

[11]  Mark J. T. Smith,et al.  Authorship Attribution Using a Neural Network Language Model , 2016, AAAI.

[12]  Maarten Marx,et al.  Time-Aware Authorship Attribution for Short Text Streams , 2015, SIGIR.

[13]  Michal Rosen-Zvi,et al.  Hidden Topic Markov Models , 2007, AISTATS.

[14]  Jorge Nocedal,et al.  On the limited memory BFGS method for large scale optimization , 1989, Math. Program..

[15]  Shlomo Argamon,et al.  Overview of the International Authorship Identification Competition at PAN-2011 , 2011, CLEF.

[16]  Shlomo Argamon,et al.  Effects of Age and Gender on Blogging , 2006, AAAI Spring Symposium: Computational Approaches to Analyzing Weblogs.

[17]  Thomas L. Griffiths,et al.  The Author-Topic Model for Authors and Documents , 2004, UAI.

[18]  Tommy W. S. Chow,et al.  Organizing Books and Authors by Multilayer SOM , 2016, IEEE Transactions on Neural Networks and Learning Systems.

[19]  Chin-Hui Lee,et al.  Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains , 1994, IEEE Trans. Speech Audio Process..

[20]  E. Nadaraya On Estimating Regression , 1964 .

[21]  F. Mosteller,et al.  Inference and Disputed Authorship: The Federalist , 1966 .

[22]  Shlomo Argamon,et al.  Authorship attribution in the wild , 2010, Lang. Resour. Evaluation.

[23]  Efstathios Stamatatos A survey of modern authorship attribution methods , 2009 .

[24]  John F. Burrows,et al.  ‘An ocean where each kind. . .’: Statistical analysis and some major determinants of literary style , 1989, Comput. Humanit..

[25]  Ingrid Zukerman,et al.  Authorship Attribution with Author-aware Topic Models , 2012, ACL.

[26]  Walter Daelemans,et al.  Authorship Attribution and Verification with Many Authors and Limited Data , 2008, COLING.

[27]  Dale Schuurmans,et al.  Augmenting Naive Bayes Classifiers with Statistical Language Models , 2004, Information Retrieval.

[28]  Ingrid Zukerman,et al.  Authorship Attribution with Latent Dirichlet Allocation , 2011, CoNLL.

[29]  Nasser M. Nasrabadi,et al.  Pattern Recognition and Machine Learning , 2006, Technometrics.