Automatic authorship classification of two ancient books: Quran and Hadith

Nowadays the need of a scientific and rigorous tool of automatic authorship classification has become pretty important, especially for ancient documents authentication such as religious or historical books. Hence, in this paper, we conduct some experiments of authorship classification on the Quran and Hadith in order to see if they could have the same author or not (ie. Was the Quran written by the Prophet or only sent down to him, as claimed?). This task, which is commonly called authorship discrimination, represents an important authorship classification application. It consists in checking whether two texts are written by the same author or not by using some AI (Artificial Intelligence) and TM (Text mining) techniques. In our case, two main investigations are conducted and presented: in the first one, the two books are analyzed in a global form; in the second investigation, the two books are segmented into 25 different text segments: 14 segments are extracted from the Quran and 11 ones are extracted from the Hadith. The different segments have more or less the same size, with approximately 2080 tokens per text segment. Several classifiers are employed: SMO-based Support Vector Machines (SVM), Multi Layer Perceptron (MLP) and Linear Regression (LR). This research work has allowed getting extremely interesting information on the ancient books origins.

[1]  Stella Markantonatou,et al.  Discriminating the registers and styles in the Modern Greek language , 2000 .

[2]  Malcolm W. Corney,et al.  Analysing e-mail text authorship for forensic purposes , 2003 .

[3]  H. Sayoud,et al.  Authorship attribution of ancient texts written by ten arabic travelers using a SMO-SVM classifier , 2012, 2012 International Conference on Communications and Information Technology (ICCIT).

[4]  D. E. Mills,et al.  Authorship attribution applied to the Bible , 2003 .

[5]  A. Kenny,et al.  A stylometric study of the New Testament , 1986 .

[6]  I. Ibrahim,et al.  A Brief Illustrated Guide to Understanding Islam , 1996 .

[7]  Derek Abbott,et al.  Advanced text authorship detection methods and their application to biblical texts , 2005, SPIE Micro + Nano Materials, Devices, and Applications.

[8]  Ian H. Witten,et al.  Weka: Practical machine learning tools and techniques with Java implementations , 1999 .

[9]  Halim Sayoud,et al.  Author discrimination between the Holy Quran and Prophet's statements , 2012, Lit. Linguistic Comput..

[10]  Γιώργος Ταμπουρατζής,et al.  Employing Statistical Methods for Obtaining Discriminant Style Markers within a Specific Register , 2003 .

[11]  Patrick Juola,et al.  JGAAP: A System for Comparative Evaluation of Authorship Attribution , 2009 .

[12]  Shlomo Argamon,et al.  Author Identification on the Large Scale , 2005 .

[13]  D. Holmes The Evolution of Stylometry in Humanities Scholarship , 1998 .

[14]  Rong Zheng,et al.  From fingerprint to writeprint , 2006, Commun. ACM.

[15]  Maciej Eder,et al.  Does size matter? Authorship attribution, small samples, big problem , 2015, Digit. Scholarsh. Humanit..