Towards an authorship analysis of two religious documents

In this paper, we try to make an author identification of two ancient Arabic religious books dating from the 6th century: The holy Quran and the Hadith. The authorship identification process is achieved through four phases which are: documents collection, text preprocessing, features extraction and classification model building. Thus, two series of experiments are undergone and commented. The first experiment deals with authorship identification of the two books using a Manhattan centroid distance and SMO-SVM classifier. Whereas, in the second experiment a Hierarchical Clustering is employed to distinguish the different segments belonging to the two books. For that purpose, three types of original NLP features are combined before the classification process. The results show good authorship identification performances with an accuracy of 100%. In fact, all the results of this investigation correspond to a clear authorship distinction between the two religious books.

[1]  Efstathios Stamatatos,et al.  A survey of modern authorship attribution methods , 2009, J. Assoc. Inf. Sci. Technol..

[2]  D. Holmes The Evolution of Stylometry in Humanities Scholarship , 1998 .

[3]  Halim Sayoud A Visual Analytics based Investigation on the Authorship of the Holy Quran , 2015, IVAPP.

[4]  Halim Sayoud,et al.  Author discrimination between the Holy Quran and Prophet's statements , 2012, Lit. Linguistic Comput..

[5]  Hans van Halteren,et al.  Linguistic Profiling for Authorship Recognition and Verification , 2004, ACL.

[6]  H. V. Halteren,et al.  Linguistic Profiling for Author Recognition and Verification , 2017 .

[7]  I. Ibrahim,et al.  A Brief Illustrated Guide to Understanding Islam , 1996 .

[8]  Patrick Juola,et al.  Large-Scale Experiments in Authorship Attribution , 2012 .

[9]  Ingrid Zukerman,et al.  Authorship Attribution with Topic Models , 2014, CL.

[10]  Frederick Mosteller,et al.  Applied Bayesian and classical inference : the case of the Federalist papers , 1984 .

[11]  Siham Ouamour,et al.  Authorship attribution of ancient texts written by ten Arabic travelers using character N-Grams , 2013, 2013 International Conference on Computer, Information and Telecommunication Systems (CITS).

[12]  Efstathios Stamatatos,et al.  Overview of the Author Identification Task at PAN 2013 , 2013, CLEF.

[13]  Yoon Kim,et al.  Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[14]  S. Sathiya Keerthi,et al.  Improvements to Platt's SMO Algorithm for SVM Classifier Design , 2001, Neural Computation.

[15]  George M. Mohay,et al.  Mining e-mail content for author identification forensics , 2001, SGMD.

[16]  Kareem Shaker Investigating features and techniques for Arabic authoriship attribution , 2012 .

[17]  Shlomo Argamon,et al.  Author Identification on the Large Scale , 2005 .