A comprehensive study of text classification algorithms

Huge amount of data in today's world are stored in the form of electronic documents. Text mining is the process of extracting the information out of those textual documents. Text classification is the process of classifying text documents into fixed number of predefined classes. The application of text classification includes spam filtering, email routing, sentiment analysis, language identification etc. This paper discusses a detailed survey on the text classification process and various algorithms used in this field.

[1]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[2]  Ping Bai,et al.  The Improved Naive Bayesian WEB Text Classification Algorithm , 2009, 2009 International Symposium on Computer Network and Multimedia Technology.

[3]  Guy W. Mineau,et al.  A simple KNN algorithm for text categorization , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[4]  K. R. Bindu,et al.  Performance Evaluation of Topic Modelling Algorithms with an application of Q A Dataset , 2015 .

[5]  Gaurav S. Chavan,et al.  A Survey of Various Machine Learning Techniques for Text Classification , 2014 .

[6]  Nan Yu,et al.  Performance of using LDA for Chinese news text classification , 2015, 2015 IEEE 28th Canadian Conference on Electrical and Computer Engineering (CCECE).

[7]  Wongkot Sriurai,et al.  IMPROVING TEXT CATEGORIZATION BY USING A TOPIC MODEL , 2011 .

[8]  Haitao Liu,et al.  An improved KNN text classification algorithm based on density , 2011, 2011 IEEE International Conference on Cloud Computing and Intelligence Systems.

[9]  Yiming Yang,et al.  Robustness of regularized linear classification methods in text categorization , 2003, SIGIR.

[10]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[11]  David D. Lewis,et al.  A comparison of two learning algorithms for text categorization , 1994 .

[12]  Minoru Sasaki,et al.  Rule-based text categorization using hierarchical categories , 1998, SMC'98 Conference Proceedings. 1998 IEEE International Conference on Systems, Man, and Cybernetics (Cat. No.98CH36218).

[13]  Y. J. Lv,et al.  Application of Quantum Genetic Algorithm on Finding Minimal Reduct , 2007 .

[14]  K. R. Bindu,et al.  Performance Evaluation of Algorithms for Expert finding on an Open Email dataset , 2015 .

[15]  Tong Zhang,et al.  Text Categorization Based on Regularized Linear Classification Methods , 2001, Information Retrieval.

[16]  P. V. G. D. Prasad Reddy,et al.  A survey of cross-domain text categorization techniques , 2012, 2012 1st International Conference on Recent Advances in Information Technology (RAIT).

[17]  Haiyi Zhang,et al.  Naïve Bayes Text Classifier , 2007 .

[18]  Thorsten Joachims,et al.  A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization , 1997, ICML.

[19]  Ziqiang Wang,et al.  An Optimal Text Categorization Algorithm Based on SVM , 2006, 2006 International Conference on Communications, Circuits and Systems.

[20]  Wenqian Shang,et al.  The Research of kNN Text Categorization Algorithm Based on Eager Learning , 2012, 2012 International Conference on Industrial Control and Electronics Engineering.

[21]  Zhou Yong,et al.  A Supervised Local Linear Embedding Based SVM Text Classification Algorithm , 2009, 2009 Sixth Web Information Systems and Applications Conference.

[22]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[23]  Charu C. Aggarwal,et al.  A Survey of Text Classification Algorithms , 2012, Mining Text Data.

[24]  K. R. Chandran,et al.  Naïve Bayes text classification with positive features selected by statistical method , 2009, 2009 First International Conference on Advanced Computing.

[25]  K. A. Vidhya,et al.  A Survey of Naïve Bayes Machine Learning approach in Text Document Classification , 2010, ArXiv.