论文信息 - Author Identification on Noise Arabic Documents

Author Identification on Noise Arabic Documents

In the present research work, we deal with the problem of authorship attribution of Ancient Arabic Philosophers. For that purpose, we conducted several authorship attribution experiments applied to different noise Arabic text. A special dataset, called “A4P” (Authorship Attribution for Ancient Arabic Philosophers), has been constructed by extracting texts from the books of 5 Ancient Arabic Philosophers, where the genre and the topic are similar. In our approach two types of features were employed; character N-grams and words and several classifiers are used, namely: Support Vector Machines, Multi Layer Perceptron, Linear Regression, Stamatatos distance and Manhattan distance. The obtained results show that the failure limit and classification performances depend on the used features, the classification technique and the level of noise. In the overall the performances of the proposed techniques are quite interesting by showing the effect of noise on authorship attribution.

Halim Sayoud | S. Bourib

[1] Sophia Ananiadou,et al. Automatic Authorship Identification , 2007 .

[2] Efstathios Stamatatos,et al. Author identification: Using text sampling to handle the class imbalance problem , 2008, Inf. Process. Manag..

[3] Benjamin C. M. Fung,et al. A unified data mining solution for authorship analysis in anonymous textual communications , 2013, Inf. Sci..

[4] Nick Cercone,et al. N-GRAM-BASED AUTHOR PROFILES FOR , 2003 .

[5] Halim Sayoud,et al. Author discrimination between the Holy Quran and Prophet's statements , 2012, Lit. Linguistic Comput..

[6] Dale Schuurmans,et al. Text Classification in Asian Languages without Word Segmentation , 2003 .

[7] Walter Daelemans,et al. The effect of author set size and data size in authorship attribution , 2011, Lit. Linguistic Comput..