The Research of kNN Text Categorization Algorithm Based on Eager Learning

Text categorization is a fundamental methodology of text mining and it is also a hot topic of the research of data mining and web mining in recent years. It plays an important role in business, government decision-making management, scientific research, and so on. This paper presents an improved algorithm of text categorization which combines eager learning with kNN classification. Experimental results show that the improved algorithm not only improve the efficiency of categorization, but also significantly increase the accuracy of categorization and produce a qualitative leap on the practical value of the sensitive information system.

[1]  Pasi Fränti,et al.  Web Data Mining , 2009, Encyclopedia of Database Systems.

[2]  Michael E. Lesk,et al.  Computer Evaluation of Indexing and Text Processing , 1968, JACM.

[3]  Gerard Salton,et al.  A vector space model for automatic indexing , 1975, CACM.

[4]  Jiawei Han,et al.  Data Mining: Concepts and Techniques, Second Edition , 2006, The Morgan Kaufmann series in data management systems.