Unsupervised Learning for Document Classification : Feasibility , Limitation , and the Bottom Line

While unsupervised learning methods are usually proposed to handle document clustering, in the literature, there exist practices that apply these methods to document classification as well. This paper analyzes the feasibility and the limitation of such practice, and studies its efficacy through a preliminary case study on the Reuters-21578 document collection.

[1]  Ah-Hwee Tan,et al.  Modified ART 2A growing network capable of generating a fixed number of nodes , 2004, IEEE Transactions on Neural Networks.

[2]  Ah-Hwee Tan,et al.  Adaptive resonance associative map , 1995, Neural Networks.

[3]  Stephen Grossberg,et al.  ART 2-A: an adaptive resonance algorithm for rapid category learning and recognition , 1991, IJCNN-91-Seattle International Joint Conference on Neural Networks.

[4]  Ah-Hwee Tan,et al.  On Machine Learning Methods for Chinese Document Categorization , 2003, Applied Intelligence.

[5]  Stephen Grossberg,et al.  ARTMAP: supervised real-time learning and classification of nonstationary data by a self-organizing neural network , 1991, [1991 Proceedings] IEEE Conference on Neural Networks for Ocean Engineering.

[6]  N. B. Venkateswarlu,et al.  Fast isodata clustering algorithms , 1992, Pattern Recognit..

[7]  Ah-Hwee Tan,et al.  ART-C: a neural architecture for self-organization under constraints , 2002, Proceedings of the 2002 International Joint Conference on Neural Networks. IJCNN'02 (Cat. No.02CH37290).

[8]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[9]  Andreas Rauber,et al.  Text Classification and Labelling of Document Clusters with Self-Organising Maps , 2000 .

[10]  Teuvo Kohonen,et al.  Self-organization and associative memory: 3rd edition , 1989 .

[11]  Belur V. Dasarathy,et al.  Nearest neighbor (NN) norms: NN pattern classification techniques , 1991 .

[12]  A.N. Zincir-Heywood,et al.  A comparison of SOM based document categorization systems , 2003, Proceedings of the International Joint Conference on Neural Networks, 2003..

[13]  Thomas G. Dietterich Approximate Statistical Tests for Comparing Supervised Classification Learning Algorithms , 1998, Neural Computation.

[14]  Samuel Kaski,et al.  Self organization of a massive document collection , 2000, IEEE Trans. Neural Networks Learn. Syst..

[15]  Julius T. Tou,et al.  Pattern Recognition Principles , 1974 .