Classification of Persian textual documents using learning vector quantization

Classification of the text documents into a predefined set of classes is considered to be an important task for natural language processing applications. There is usually a tradeoff between accuracy and complexity of text classification systems. In this paper, an experiment of classification of Persian documents by using the Learning Vector Quantization network is presented. In this method, each class is presented by an exemplar vector called codebook. The codebook vectors are placed in the feature space in a way that decision boundaries are approximated by the nearest neighbor rule. Compared to the K-Nearest Neighbour method, the LVQ requires less training examples and is believed to be much faster than other classification methods. The experimental results obtained from the classification of Persian textual documents using the LVQ algorithm are promising and prove that it can perform as an alternative to other methods like Support Vector Machines.

[1]  Teuvo Kohonen,et al.  Self-organization and associative memory: 3rd edition , 1989 .

[2]  Luis Alfonso Ureña López,et al.  The learning vector quantization algorithm applied to automatic text classification tasks , 2007, Neural Networks.

[3]  Dieter Merkl,et al.  Text classification with self-organizing maps: Some lessons learned , 1998, Neurocomputing.

[4]  Thorsten Joachims,et al.  A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization , 1997, ICML.

[5]  Andreas S. Weigend,et al.  A neural network approach to topic spotting , 1995 .

[6]  Yirong Shen,et al.  Improving the Performance of Naive Bayes for Text Classification , 2003 .

[7]  . M.SikanderHayatKhiyal,et al.  Classification of Textual Documents Using Learning Vector Quantization , 2007 .

[8]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[9]  Stefan Wermter,et al.  Neural Network Agents for Learning Semantic Text Classification , 2000, Information Retrieval.

[10]  Farhad Oroumchian,et al.  Assessment of a Modern Farsi Corpus , 2005 .

[11]  Yiming Yang,et al.  A re-examination of text categorization methods , 1999, SIGIR '99.

[12]  Masoud Rahgozar,et al.  Farsi Text Classification Using N-Grams and Knn Algorithm A Comparative Study , 2008, DMIN.

[13]  Teuvo Kohonen,et al.  Improved versions of learning vector quantization , 1990, 1990 IJCNN International Joint Conference on Neural Networks.