Implementation of Unsupervised and Supervised Learning Systems for Multilingual Text Categorization

In this paper we discuss the implementation of the leading supervised and unsupervised approaches for multilingual text categorization. We selected support vector machines (SVM) and latent semantic indexing (LSI) techniques as representatives of supervised and unsupervised methods for system implementation, respectively. The preliminary results show that our platform models including both supervised and unsupervised learning methods have the potentials for multilingual text categorization

[1]  Yiming Yang,et al.  A Comparative Study on Feature Selection in Text Categorization , 1997, ICML.

[2]  Hwee Tou Ng,et al.  Feature selection, perceptron learning, and a usability case study for text categorization , 1997, SIGIR '97.

[3]  Yiming Yang,et al.  A re-examination of text categorization methods , 1999, SIGIR '99.

[4]  Susan T. Dumais,et al.  Automatic Cross-Language Information Retrieval Using Latent Semantic Indexing , 1998 .

[5]  Yiming Yang,et al.  An Evaluation of Statistical Approaches to Text Categorization , 1999, Information Retrieval.

[6]  David L. Waltz,et al.  Classifying news stories using memory based reasoning , 1992, SIGIR '92.

[7]  Ricco Rakotomalala,et al.  Cadre pour la catégorisation de textes multilingues , 2004 .

[8]  Andreas S. Weigend,et al.  A neural network approach to topic spotting , 1995 .

[9]  Vladimir Vapnik,et al.  An overview of statistical learning theory , 1999, IEEE Trans. Neural Networks.

[10]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[11]  Andrew McCallum,et al.  A comparison of event models for naive bayes text classification , 1998, AAAI 1998.

[12]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[13]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[14]  Hsin-Chang Yang,et al.  A classifier-based text mining approach for evaluating semantic relatedness using support vector machines , 2005, International Conference on Information Technology: Coding and Computing (ITCC'05) - Volume II.

[15]  Hsin-Chang Yang,et al.  A Novel Multilingual Text Categorization System using Latent Semantic Indexing , 2006, First International Conference on Innovative Computing, Information and Control - Volume I (ICICIC'06).

[16]  Daphne Koller,et al.  Hierarchically Classifying Documents Using Very Few Words , 1997, ICML.