Topic identification of Arabic noisy texts based on KNN

This paper deals with the problem of topic identification of Arabic noisy texts, which is an important research field, regarding the growing amount of shared textual information in the world. The dataset used in this survey is constructed by collecting several corrupted Arabic texts from different discussion forums related to six different topics. The proposed algorithms use the k-nearest neighbor classifier based on the Tf-Idf to identify the texts topics. Furthermore, two training schemes are proposed for the creation of the reference profiles. Moreover, several distance measures are proposed and employed to compute the similarity between texts/topics. Results show that the proposed distance measures are quite interesting in topic identification.