A Transformer-Based Approach to Authorship Attribution in Classical Arabic Texts

Authorship attribution (AA) is a field of natural language processing that aims to attribute text to its author. Although the literature includes several studies on Arabic AA in general, applying AA to classical Arabic texts has not gained similar attention. This study focuses on investigating recent Arabic pretrained transformer-based models in a rarely studied domain with limited research contributions: the domain of Islamic law. We adopt an experimental approach to investigate AA. Because no dataset has been designed specifically for this task, we design and build our own dataset using Islamic law digital resources. We conduct several experiments on fine-tuning four Arabic pretrained transformer-based models: AraBERT, AraELECTRA, ARBERT, and MARBERT. Results of the experiments indicate that for the task of attributing a given text to its author, ARBERT and AraELECTRA outperform the other models with an accuracy of 96%. We conclude that pretrained transformer models, specifically ARBERT and AraELECTRA, fine-tuned using the Islamic legal dataset, show significant results in applying AA to Islamic legal texts.

[1]  Constantine Kotropoulos,et al.  Authorship Attribution in Greek Literature Using Word Adjacencies , 2022, SETN.

[2]  O. Olugbara,et al.  Post-Authorship Attribution Using Regularized Deep Neural Network , 2022, Applied Sciences.

[3]  Taher Zaki,et al.  Towards Arabic aspect-based sentiment analysis: a transfer learning-based approach , 2021, Social Network Analysis and Mining.

[4]  Halim Sayoud,et al.  Arabic Authorship Attribution Using Synthetic Minority Over-Sampling Technique and Principal Components Analysis for Imbalanced Documents , 2021, Int. J. Cogn. Informatics Nat. Intell..

[5]  Faisal Muhammad Shah,et al.  Bornon: Bengali Image Captioning with Transformer-Based Deep Learning Approach , 2021, SN Computer Science.

[6]  Adnan H. Yahya,et al.  Authorship Attribution of Modern Standard Arabic Short Texts , 2021, ArabWIC.

[7]  Ivandré Paraboni,et al.  Stacked authorship attribution of digital texts , 2021, Expert Syst. Appl..

[8]  Katikapalli Subramanyam Kalyan,et al.  AMMU: A survey of transformer-based biomedical pretrained language models , 2021, J. Biomed. Informatics.

[9]  K. Apoorva,et al.  Deep neural network and model-based clustering technique for forensic electronic mail author attribution , 2021, SN Applied Sciences.

[10]  Muhammad Abdul-Mageed,et al.  ARBERT & MARBERT: Deep Bidirectional Transformers for Arabic , 2020, ACL.

[11]  Anastasia Mikhailovna Fedotova,et al.  Authorship Identification of a Russian-Language Text Using Support Vector Machine and Deep Neural Networks , 2020, Future Internet.

[12]  Mohammed Al-Sarem,et al.  Performance of authorship attribution classifiers with short texts: application of religious Arabic fatwas , 2020, Int. J. Data Min. Model. Manag..

[13]  Using Ontology for Revealing Authorship Attribution of Arabic Text , 2020, International Journal of Engineering and Advanced Technology.

[14]  Kristian Martinsen,et al.  Prediction of geometry deviations in additive manufactured parts: comparison of linear regression with machine learning algorithms , 2020, J. Intell. Manuf..

[15]  Abdulfattah Omar,et al.  The Effectiveness of Stemming in the Stylometric Authorship Attribution in Arabic , 2020 .

[16]  Faisal Saeed,et al.  Ensemble Methods for Instance-Based Arabic Language Authorship Attribution , 2020, IEEE Access.

[17]  Ahmad Kamal Hayati Yahya,et al.  Authorship Attribution of Arabic Articles , 2019, ICALP.

[18]  Al-Falahi Ahmed,et al.  Arabic Poetry Authorship Attribution using Machine Learning Techniques , 2019, Journal of Computer Science.

[19]  Benjamin C. M. Fung,et al.  Arabic Authorship Attribution , 2018, ACM Trans. Asian Low Resour. Lang. Inf. Process..

[20]  Mohammed Al-Sarem,et al.  Combination of Stylo-based Features and Frequency-based Features for Identifying the Author of Short Arabic Text , 2018, SITA.

[21]  Siham Ouamour,et al.  A Comparative Survey of Authorship Attribution on Short Arabic Texts , 2018, SPECOM.

[22]  Chunhua Wang,et al.  Machine Learning and Deep Learning Methods for Cybersecurity , 2018, IEEE Access.

[23]  Yiming Yan,et al.  Surveying Stylometry Techniques and Applications , 2017, ACM Comput. Surv..

[24]  H Hadjadj,et al.  Fusion Based Authorship Attribution-Application of Comparison Between the Quran and Hadith , 2017, ICALP.

[25]  Sarana Nutanong,et al.  The Key Factors and Their Influence in Authorship Attribution , 2016, Res. Comput. Sci..

[26]  Mohamed El Bachir Menai,et al.  Naïve Bayes classifiers for authorship attribution of Arabic texts , 2014, J. King Saud Univ. Comput. Inf. Sci..

[27]  Patrick Juola,et al.  Authorship Attribution , 2008, Found. Trends Inf. Retr..

[28]  Justin Zobel,et al.  Using Relative Entropy for Authorship Attribution , 2006, AIRS.

[29]  T C Mendenhall,et al.  THE CHARACTERISTIC CURVES OF COMPOSITION. , 1887, Science.

[30]  Muazzam Ahmed Siddiqui,et al.  Towards Authorship Attribution in Arabic Short-Microblog Text , 2021, IEEE Access.

[31]  Computational Data and Social Networks: 10th International Conference, CSoNet 2021, Virtual Event, November 15–17, 2021, Proceedings , 2021, CSoNet.

[32]  Nagy Ramadan Darwish,et al.  A Survey on Authorship Attribution Issues of Arabic Text , 2020 .

[33]  Patrick Juola,et al.  Authorship and Time Attribution of Arabic Texts Using JGAAP , 2018 .

[34]  Efstathios Stamatatos,et al.  Authorship Attribution for Social Media Forensics , 2017, IEEE Transactions on Information Forensics and Security.

[35]  G. Yule ON SENTENCE- LENGTH AS A STATISTICAL CHARACTERISTIC OF STYLE IN PROSE: WITH APPLICATION TO TWO CASES OF DISPUTED AUTHORSHIP , 1939 .