The Key Factors and Their Influence in Authorship Attribution

Authorship attribution has a long history started since 19th century. Existing studies have used different sets of stylometric features and computational methodologies on a variety of corpus with different lengths and genres. This study presents a protocol to perform a systematic literature review (SLR) to identify the best combination of stylometric features and computational methodology. Specifically, we formulate an SLR protocol that can be used to conduct a literature survey to help answer like (i) whether it is possible to identify the authorial style of an author regardless the genre and length of the text, and (ii) how to select specific stylometric features and computational methodology. We also conduct an example of how the proposed SLR protocol can be used as a template for publication extraction and filtering for an SLR on authorship attribution.

[1]  Matthew L. Jockers,et al.  A comparative study of machine learning methods for authorship attribution , 2010, Lit. Linguistic Comput..

[2]  Ernst Stadlober,et al.  Classification of Author and/or Genre? The Impact of Word Length , 2004, GfKl.

[3]  Vittorio Loreto,et al.  Language trees and zipping. , 2002, Physical review letters.

[4]  George K. Mikros,et al.  Investigating Topic Influence in Authorship Attribution , 2007, PAN.

[5]  Shlomo Argamon,et al.  Author Identification on the Large Scale , 2005 .

[6]  Lianping Chen,et al.  Towards an Evidence-Based Understanding of Electronic Data Sources , 2010, EASE.

[7]  Hugo Jair Escalante,et al.  Local Histograms of Character N-grams for Authorship Attribution , 2011, ACL.

[8]  A. Pandian,et al.  Authorship Categorization in Email Investigations using Fisher's linear Discriminant method with radial Basis function , 2014, J. Comput. Sci..

[9]  H. T. Eddy The characteristic curves of composition. , 1887, Science.

[10]  Rong Zheng,et al.  A framework for authorship identification of online messages: Writing-style features and classification techniques , 2006, J. Assoc. Inf. Sci. Technol..

[11]  Naveed Ikram,et al.  Software Development Outsourcing Relationships Trust: A Systematic Literature Review Protocol , 2010, EASE.

[12]  Joseph Rudman,et al.  The State of Authorship Attribution Studies: Some Problems and Solutions , 1997, Comput. Humanit..

[13]  I.N. Bozkurt,et al.  Authorship attribution , 2007, 2007 22nd international symposium on computer and information sciences.

[14]  G. Zipf Selected Studies of the Principle of Relative Frequency in Language , 2014 .

[15]  Shlomo Argamon,et al.  Interpreting Burrows's Delta: Geometric and Probabilistic Foundations , 2007, Lit. Linguistic Comput..

[16]  F. Khosmood,et al.  Toward Unification of Source Attribution Processes and Techniques , 2006, 2006 International Conference on Machine Learning and Cybernetics.

[17]  Barbara Kitchenham,et al.  Procedures for Performing Systematic Reviews , 2004 .

[18]  R. H. Baayen,et al.  An experiment in authorship attribution , 2002 .

[19]  Graeme Hirst,et al.  Bigrams of Syntactic Labels for Authorship Discrimination of Short Texts , 2007, Lit. Linguistic Comput..

[20]  Moshe Koppel,et al.  Authorship verification as a one-class classification problem , 2004, ICML.

[21]  Arif Ali Khan,et al.  Communication risks in GSD during RCM: Results from SLR , 2014, 2014 International Conference on Computer and Information Sciences (ICCOINS).

[22]  F. Mosteller,et al.  Inference and Disputed Authorship: The Federalist , 1966 .

[23]  Naveed Ikram,et al.  Empirical Evidence in Software Architecture: A Systematic Literature Review Protocol , 2011, ICSEA 2011.

[24]  Banu Diri,et al.  Automatic Turkish Text Categorization in Terms of Author, Genre and Gender , 2006, NLDB.

[25]  Jakob Grue Simonsen,et al.  Lost in Translation: Authorship Attribution using Frame Semantics , 2011, ACL.

[26]  Anat Rachel Shimoni,et al.  Gender, genre, and writing style in formal written texts , 2003 .

[27]  Patrick Juola,et al.  Authorship Attribution for Electronic Documents , 2006, IFIP Int. Conf. Digital Forensics.

[28]  G. Yule ON SENTENCE- LENGTH AS A STATISTICAL CHARACTERISTIC OF STYLE IN PROSE: WITH APPLICATION TO TWO CASES OF DISPUTED AUTHORSHIP , 1939 .

[29]  Arif Ali Khan,et al.  Communication Risks and Best Practices in Global Software Development during Requirements Change Management: A Systematic Literature Review Protocol , 2013 .

[30]  Efstathios Stamatatos,et al.  Automatic Text Categorization In Terms Of Genre and Author , 2000, CL.

[31]  Maciej Eder,et al.  Does size matter? Authorship attribution, small samples, big problem , 2015, Digit. Scholarsh. Humanit..

[32]  Shlomo Argamon,et al.  Computational methods in authorship attribution , 2009, J. Assoc. Inf. Sci. Technol..

[33]  Patrick Juola,et al.  A Controlled-corpus Experiment in Authorship Identification by Cross-entropy , 2003 .

[34]  Justin Zobel,et al.  Effective and Scalable Authorship Attribution Using Function Words , 2005, AIRS.

[35]  Maciej Eder,et al.  Deeper Delta across genres and languages: do we really need the most frequent words? , 2011, Lit. Linguistic Comput..

[36]  Walter Daelemans,et al.  The effect of author set size and data size in authorship attribution , 2011, Lit. Linguistic Comput..

[37]  Efstathios Stamatatos,et al.  N-Gram Feature Selection for Authorship Identification , 2006, AIMSA.

[38]  Efstathios Stamatatos,et al.  Text Genre Detection Using Common Word Frequencies , 2000, COLING.

[39]  Siffat Ullah Khan,et al.  Knowledge sharing management in offshore software development outsourcing relationships from vendors' perspective: A systematic literature review protocol , 2011, 2011 Malaysian Conference in Software Engineering.

[40]  Siffat Ullah Khan,et al.  Swot Analysis Of Software Quality Metrics For Global Software Development: A Systematic Literature Review Protocol , 2012 .

[41]  D. Holmes The Evolution of Stylometry in Humanities Scholarship , 1998 .

[42]  Boris Katz,et al.  A Comparative Study of Language Models for Book and Author Recognition , 2005, IJCNLP.

[43]  Efstathios Stamatatos,et al.  Computer-Based Authorship Attribution Without Lexical Measures , 2001, Comput. Humanit..

[44]  Graeme Hirst,et al.  Segmenting documents by stylistic character , 2005, Natural Language Engineering.