Authorship Attribution in Bengali Language

We describe Authorship Attribution of Bengali literary text. Our contributions include a new corpus of 3,000 passages written by three Bengali authors, an end-to-end system for authorship classification based on character n-grams, feature selection for authorship attribution, feature ranking and analysis, and learning curve to assess the relationship between amount of training data and test accuracy. We achieve state-of-theart results on held-out dataset, thus indicating that lexical n-gram features are unarguably the best discriminators for authorship attribution of Bengali literary text.

[1]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[2]  Pabitra Mitra,et al.  Author Identification in Bengali Literary Works , 2011, PReMI.

[3]  Angeliki Lazaridou,et al.  Cross-Language Authorship Attribution , 2014, LREC.

[4]  Shlomo Argamon,et al.  Computational methods in authorship attribution , 2009, J. Assoc. Inf. Sci. Technol..

[5]  Efstathios Stamatatos,et al.  A survey of modern authorship attribution methods , 2009, J. Assoc. Inf. Sci. Technol..

[6]  Andreas van Cranenburgh Literary authorship attribution with phrase-structure fragments , 2012, CLfL@NAACL-HLT.

[7]  Yejin Choi,et al.  Gender Attribution: Tracing Stylometric Evidence Beyond Topic and Genre , 2011, CoNLL.

[8]  F. Mosteller,et al.  A comparative study of discrimination methods applied to the authorship of the disputed Federalist papers , 2016 .

[9]  Siladitya Jana,et al.  Sister Nivedita's influence on J. C. Bose's writings , 2015, J. Assoc. Inf. Sci. Technol..

[10]  Ingrid Zukerman,et al.  Authorship Attribution with Author-aware Topic Models , 2012, ACL.

[11]  F. Mosteller,et al.  Inference in an Authorship Problem , 1963 .

[12]  Ulf Brefeld,et al.  An Off-the-shelf Approach to Authorship Attribution , 2014, COLING.

[13]  Tanmoy Chakraborty Authorship Identification Using Stylometry Analysis in Bengali Literature , 2012, ArXiv.

[14]  Stan Matwin,et al.  Authorship Attribution in Health Forums , 2013, RANLP.

[15]  Ingrid Zukerman,et al.  Authorship Attribution with Topic Models , 2014, CL.

[16]  Roy Schwartz,et al.  Authorship Attribution of Micro-Messages , 2013, EMNLP.

[17]  T. Raghunadha Reddy,et al.  Empirical Evaluations Using Character and Word N-Grams on Authorship Attribution for Telugu Text , 2015 .

[18]  Walter Daelemans,et al.  Authorship Attribution and Verification with Many Authors and Limited Data , 2008, COLING.

[19]  Richard Dazeley,et al.  Authorship Attribution for Twitter in 140 Characters or Less , 2010, 2010 Second Cybercrime and Trustworthy Computing Workshop.

[20]  I.N. Bozkurt,et al.  Authorship attribution , 2007, 2007 22nd international symposium on computer and information sciences.