A Visual Analytics based Investigation on the Authorship of the Holy Quran

In this paper, we present a visual analytics based investigation for the task of authorship attribution of the holy Quran with regards to the Hadith Author (the Prophet). This can be seen as an authorship discrimination task between the two religious books: Quran vs Hadith. The first book represents the Divine book written by Allah (God) as claimed by the Prophet Muhammad, whereas the second one represents a collection of certified Prophet’s statements. Two visual analytics clustering methods are employed, namely: a Hierarchical Clustering and Fuzzy Cmean Clustering. On the other hand, seven types of NLP features are combined and normalized by PCA reduction before the classification process. The visual analytics results have revealed interesting results in 2D and 3D disposition. In summary, they show two main clusters in both experiments: Quran cluster and Hadith cluster; and the disposition of the resulting clusters corresponds to a clear authorship distinction between the two religious books.

[1]  Rong Zheng,et al.  From fingerprint to writeprint , 2006, Commun. ACM.

[2]  Efstathios Stamatatos,et al.  Computer-Based Authorship Attribution Without Lexical Measures , 2001, Comput. Humanit..

[3]  Patrick Juola,et al.  Authorship Attribution , 2008, Found. Trends Inf. Retr..

[4]  R. Suganya,et al.  Fuzzy C- Means Algorithm- A Review , 2012 .

[5]  Dale Schuurmans,et al.  Language independent authorship attribution using character level language models , 2003, Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - EACL '03.

[6]  Efstathios Stamatatos,et al.  A survey of modern authorship attribution methods , 2009, J. Assoc. Inf. Sci. Technol..