Automatic Arabic Text Classification

Automated document classification is an important text mining task especially with the rapid growth of the number of online documents present in Arabic language. Text classification aims to automatically assign the text to a predefined category based on linguistic features. Such a process has different useful applications including, but not restricted to, e-mail spam detection, web page content filtering, and automatic message routing. This paper presents the results of experiments on document classification achieved on seven different Arabic corpora using statistical methodology. The performance of two popular classification algorithms in classifying the aforementioned corpora has been evaluated.