Supervised and semi-supervised learning in text classification using enhanced KNN algorithm: a comparative study of supervised and semi-supervised classification in text categorisation

To make efficient decisions, knowledge in terms of experience is needed that can be obtained from the process of learning. The present paper's aim and objective are to explore the learning process in text classification using semi-supervised learning paradigm and compare the results obtained with the supervised learning classifier's accuracy. Semi-supervised learning can be applied when limited amount of training data is available. In traditional K-nearest neighbour algorithm all features are given similar weights in all classes which is not reasonable. Few features may play vital role in some classes and in others there presence has no impact. In the present paper, exploration of assigning different weights to the features in different classes based on the concept of variance is discussed. Finally to gain insight in semi-supervised learning paradigm, supervised and semi-supervised learning paradigm in text classification are compared. Results obtained show that the semi-supervised learning paradigm can be applied in cases where very limited training data is available, but still reasonable classifier accuracy can be obtained.

[1]  Vikas Sindhwani,et al.  On Manifold Regularization , 2005, AISTATS.

[2]  Mohammed Abdul Wajeed,et al.  Semi-supervised text classification using enhanced KNN algorithm , 2011, 2011 World Congress on Information and Communication Technologies.

[3]  David E. Johnson,et al.  Maximizing Text-Mining Performance , 1999 .

[4]  T. Adilakshmi,et al.  Incorporating fuzzy clusters in semi-supervised text categorization , 2011 .

[5]  Yiming Yang,et al.  An Evaluation of Statistical Approaches to Text Categorization , 1999, Information Retrieval.

[6]  Fabrizio Sebastiani,et al.  Machine learning in automated text categorization , 2001, CSUR.

[7]  Mohammed Abdul Wajeed,et al.  Different similarity measures for text classification using KNN , 2011, 2011 2nd International Conference on Computer and Communication Technology (ICCCT-2011).

[8]  Tong Zhang,et al.  A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data , 2005, J. Mach. Learn. Res..

[9]  Shie-Jue Lee,et al.  A Fuzzy Self-Constructing Feature Clustering Algorithm for Text Classification , 2011, IEEE Transactions on Knowledge and Data Engineering.

[10]  Arik Azran,et al.  The rendezvous algorithm: multiclass semi-supervised learning with Markov random walks , 2007, ICML '07.

[11]  George Forman,et al.  Scaling up text classification for large file systems , 2008, KDD.

[12]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[13]  Teresa Bernarda Ludermir,et al.  Automatic text categorization: case study , 2002, VII Brazilian Symposium on Neural Networks, 2002. SBRN 2002. Proceedings..

[14]  David W. Aha,et al.  Instance-Based Learning Algorithms , 1991, Machine Learning.

[15]  Jeen-Shing Wang,et al.  Self-adaptive neuro-fuzzy inference systems for classification applications , 2002, IEEE Trans. Fuzzy Syst..

[16]  Sarah Jane Delany k-Nearest Neighbour Classifiers , 2007 .

[17]  Vinay Kumar Polisetty Text classification using machine learning , 2012 .

[18]  Mohammed Abdul Wajeed,et al.  Incorporating fuzzy clusters in semi-supervised text categorization , 2011, 2011 Nirma University International Conference on Engineering.