论文信息 - Comparing TR-Classifier and KNN by using Reduced Sizes of Vocabularies - 字舞流文

Comparing TR-Classifier and KNN by using Reduced Sizes of Vocabularies

The aim of this study is topic identification by using two methods, in this case, a new one that we have proposed: TR-classifier which is based on computing triggers, and the well-known k Nearest Neighbors. Performances are acceptable, particularly for TR-classifier, though we have used reduced sizes of vocabularies. For the TR-Classifier, each topic is represented by a vocabulary which has been built using the corresponding training corpus. Whereas, the kNN method uses a general vocabulary, obtained by the concatenation of those used by the TR-Classifier. For the evaluation task, six topics have been selected to be identified: Culture, religion, economy, local news, international news and sports. An Arabic corpus has been used to achieve experiments.

Kamel Smaïli | Daoud Berkani | Mourad Abbas | Kamel Smaïli | Mourad Abbas | D. Berkani

[1] G Salton,et al. Developments in Automatic Text Retrieval , 1991, Science.

[2] Ronald Rosenfeld,et al. Nonlinear interpolation of topic models for language model adaptation , 1998, ICSLP.

[3] David D. Lewis,et al. A comparison of two learning algorithms for text categorization , 1994 .

[4] David L. Waltz,et al. Trading MIPS and memory for knowledge engineering , 1992, CACM.

[5] Hermann Ney,et al. Selection criteria for word trigger pairs in language modelling , 1996, ICGI.

[6] Thorsten Joachims,et al. Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[7] Ronald Rosenfeld,et al. Adaptive Statistical Language Modeling; A Maximum Entropy Approach , 1994 .

[8] Yiming Yang,et al. Expert network: effective and efficient learning from human decisions in text categorization and retrieval , 1994, SIGIR '94.

[9] Paolo Rosso,et al. Clustering Abstracts of Scientific Texts Using the Transition Point Technique , 2006, CICLing.

[10] Hwee Tou Ng,et al. Feature selection, perceptron learning, and a usability case study for text categorization , 1997, SIGIR '97.

[11] Kostas Tzeras,et al. Automatic indexing based on Bayesian inference networks , 1993, SIGIR.

[12] Mourad Abbas,et al. Comparison of Topic Identification methods for Arabic Language , 2005 .

[13] Yiming Yang,et al. A re-examination of text categorization methods , 1999, SIGIR '99.

[14] Amine Bensaid,et al. Automatic Arabic Document Categorization Based on the Naïve Bayes Algorithm , 2004 .

[15] Ronald Rosenfeld,et al. Using story topics for language model adaptation , 1997, EUROSPEECH.

[16] David L. Waltz,et al. Classifying news stories using memory based reasoning , 1992, SIGIR '92.

[17] Thorsten Joachims,et al. A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization , 1997, ICML.

[18] Yiming Yang,et al. A Comparative Study on Feature Selection in Text Categorization , 1997, ICML.

[19] Guodong Zhou,et al. Interpolation of n-gram and mutual-information based trigger pair language models for Mandarin speech recognition , 1999, Comput. Speech Lang..

[20] Andreas S. Weigend,et al. A neural network approach to topic spotting , 1995 .

[21] Takenobu Tokunaga,et al. Cluster-based text categorization: a comparison of category search strategies , 1995, SIGIR '95.