论文信息 - Topic identification of Arabic noisy texts based on KNN

Topic identification of Arabic noisy texts based on KNN

This paper deals with the problem of topic identification of Arabic noisy texts, which is an important research field, regarding the growing amount of shared textual information in the world. The dataset used in this survey is constructed by collecting several corrupted Arabic texts from different discussion forums related to six different topics. The proposed algorithms use the k-nearest neighbor classifier based on the Tf-Idf to identify the texts topics. Furthermore, two training schemes are proposed for the creation of the reference profiles. Moreover, several distance measures are proposed and employed to compute the similarity between texts/topics. Results show that the proposed distance measures are quite interesting in topic identification.

Kheireddine Abainia | Siham Ouamour | Halim Sayoud

[1] J.-P. Haton,et al. A comparative study of topic identification on newspaper and e-mail , 2001, Proceedings Eighth Symposium on String Processing and Information Retrieval.

[2] Guy W. Mineau,et al. A simple KNN algorithm for text categorization , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[3] William E. Moen,et al. Using Encyclopedic Knowledge for Automatic Topic Identification , 2009, CoNLL.

[4] Krista Lagus,et al. Topic Identification in Natural Language Dialogues Using Neural Networks , 2002, SIGDIAL Workshop.

[5] Rosni Abdullah,et al. Automatic Topic Identification Using Ontology Hierarchy , 2001, CICLing.

[6] Pavel Ircing,et al. Automatic Topic Identification for Large Scale Language Modeling Data Filtering , 2011, TSD.

[7] Amanda Spink,et al. Neural network applications for automatic new topic identification on excite web search engine data logs , 2004, ASIST.

[8] Herbert Gish,et al. Approaches to topic identification on the switchboard corpus , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[9] Taeho Jo,et al. Neural Text Categorizer for Exclusive Text Categorization , 2008, J. Inf. Process. Syst..

[10] Louis Massey,et al. Autonomous and Adaptive Identification of Topics in Unstructured Text , 2011, KES.

[11] Rada Mihalcea,et al. Topic Identification Using Wikipedia Graph Centrality , 2009, NAACL.