Author Identification on Noise Arabic Documents

In the present research work, we deal with the problem of authorship attribution of Ancient Arabic Philosophers. For that purpose, we conducted several authorship attribution experiments applied to different noise Arabic text. A special dataset, called “A4P” (Authorship Attribution for Ancient Arabic Philosophers), has been constructed by extracting texts from the books of 5 Ancient Arabic Philosophers, where the genre and the topic are similar. In our approach two types of features were employed; character N-grams and words and several classifiers are used, namely: Support Vector Machines, Multi Layer Perceptron, Linear Regression, Stamatatos distance and Manhattan distance. The obtained results show that the failure limit and classification performances depend on the used features, the classification technique and the level of noise. In the overall the performances of the proposed techniques are quite interesting by showing the effect of noise on authorship attribution.