TEXT CLASSIFICATION AND CLASSIFIERS: A SURVEY

As most information (over 80%) is stored as text, text mining is believed to have a high commercial potential value. knowledge may be discovered from m any sources of information; yet, unstructured texts remain the largest readily available source of knowledge .Text classification which classifies the documents according to predefined categories .In this paper we are tried to give the introduction of tex t classification, process of text classification as well as the overview of the classifiers and tried to compare the some existing classifier on basis of few criteria like time complexity, principal and performance .

[1]  Yiming Yang,et al.  A Linear Least Squares Fit Mapping Method for Information Retrieval From Natural Language Texts , 1992, COLING.

[2]  Sholom M. Weiss,et al.  Automated learning of decision rules for text categorization , 1994, TOIS.

[3]  Yiming Yang,et al.  An example-based mapping method for text categorization and retrieval , 1994, TOIS.

[4]  Michael J. Pazzani,et al.  Searching for Dependencies in Bayesian Classifiers , 1995, AISTATS.

[5]  Sholom M. Weiss,et al.  Rule-based Machine Learning Methods for Functional Prediction , 1995, J. Artif. Intell. Res..

[6]  Jorma Rissanen,et al.  SLIQ: A Fast Scalable Classifier for Data Mining , 1996, EDBT.

[7]  W. Bruce Croft,et al.  Combining classifiers in text categorization , 1996, SIGIR '96.

[8]  Hwee Tou Ng,et al.  Feature selection, perceptron learning, and a usability case study for text categorization , 1997, SIGIR '97.

[9]  Padmini Srinivasan,et al.  Automatic Text Categorization Using Neural Networks , 1997 .

[10]  Ido Dagan,et al.  Mistake-Driven Learning in Text Categorization , 1997, EMNLP.

[11]  Susan T. Dumais,et al.  Inductive learning algorithms and representations for text categorization , 1998, CIKM '98.

[12]  Andrew McCallum,et al.  A comparison of event models for naive bayes text classification , 1998, AAAI 1998.

[13]  Cornelis H. A. Koster,et al.  Four text classification algorithms compared on a Dutch corpus , 1998, SIGIR '98.

[14]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[15]  David D. Lewis,et al.  Naive (Bayes) at Forty: The Independence Assumption in Information Retrieval , 1998, ECML.

[16]  Anil K. Jain,et al.  Classification of text documents , 1998, Proceedings. Fourteenth International Conference on Pattern Recognition (Cat. No.98EX170).

[17]  David E. Johnson,et al.  Maximizing Text-Mining Performance , 1999 .

[18]  Yiming Yang,et al.  A re-examination of text categorization methods , 1999, SIGIR '99.

[19]  George Karypis,et al.  Centroid-Based Document Classification: Analysis and Experimental Results , 2000, PKDD.

[20]  Jung-Hyun Lee,et al.  Feature Selection Using Association Word Mining for Classification , 2001, DEXA.

[21]  Tong Zhang,et al.  A decision-tree-based symbolic rule induction system for text categorization , 2002, IBM Syst. J..

[22]  Fabrizio Sebastiani,et al.  Machine learning in automated text categorization , 2001, CSUR.

[23]  Osmar R. Zaïane,et al.  Text document categorization by term association , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[24]  Yaxin Bi,et al.  KNN Model-Based Approach in Classification , 2003, OTM.

[25]  Michael J. Pazzani,et al.  Learning and Revising User Profiles: The Identification of Interesting Web Sites , 1997, Machine Learning.

[26]  Yi Lin,et al.  Support Vector Machines and the Bayes Rule in Classification , 2002, Data Mining and Knowledge Discovery.

[27]  Worapoj Kreesuradej,et al.  A new association rule-based text classifier algorithm , 2005, 17th IEEE International Conference on Tools with Artificial Intelligence (ICTAI'05).

[28]  Amy J. C. Trappey,et al.  Development of a patent document classification and search platform using a back-propagation network , 2006, Expert Syst. Appl..

[29]  Hae-Chang Rim,et al.  Some Effective Techniques for Naive Bayes Text Classification , 2006, IEEE Transactions on Knowledge and Data Engineering.

[30]  Ajith Abraham,et al.  Improving kNN Text Categorization by Removing Outliers from Training Set , 2006, CICLing.

[31]  Behzad Moshiri,et al.  Improve text classification accuracy based on classifier fusion methods , 2007, 2007 10th International Conference on Information Fusion.

[32]  Anirban Dasgupta,et al.  Feature selection methods for text classification , 2007, KDD '07.

[33]  Shi Yong-feng,et al.  Comparison of text categorization algorithms , 2004, Wuhan University Journal of Natural Sciences.

[34]  Dino Isa,et al.  Text Document Pre-Processing Using the Bayes Formula for Classification Based on the Vector Space Model , 2008, Comput. Inf. Sci..

[35]  Songbo Tan,et al.  An improved centroid classifier for text categorization , 2008, Expert Syst. Appl..

[36]  Peerapon Vateekul,et al.  Fast Induction of Multiple Decision Trees in Text Categorization from Large Scale, Imbalanced, and Multi-label Data , 2009, 2009 IEEE International Conference on Data Mining Workshops.

[37]  Cheng Hua Li,et al.  An efficient document classification model using an improved back propagation neural network and singular value decomposition , 2009, Expert Syst. Appl..

[38]  Muhammed Miah Improved k-NN Algorithm for Text Classification , 2009, DMIN.

[39]  Yu-ping Qin,et al.  Study on Multi-label Text Classification Based on SVM , 2009, 2009 Sixth International Conference on Fuzzy Systems and Knowledge Discovery.

[40]  Chih-Hung Wu,et al.  Behavior-based spam detection using a hybrid method of rule-based techniques and neural networks , 2009, Expert Syst. Appl..

[41]  Houkuan Huang,et al.  Feature selection for text classification with Naïve Bayes , 2009, Expert Syst. Appl..

[42]  Chen Donghui,et al.  A new text categorization method based on HMM and SVM , 2010, 2010 2nd International Conference on Computer Engineering and Technology.

[43]  S. M. Kamruzzaman,et al.  Text Categorization using Association Rule and Naive Bayes Classifier , 2010, ArXiv.

[44]  D. S. Guru,et al.  Representation and Classification of Text Documents: A Brief Review , 2010 .

[45]  Khairullah Khan,et al.  A Review of Machine Learning Algorithms for Text-Documents Classification , 2010 .

[46]  K. A. Vidhya,et al.  A Survey of Naïve Bayes Machine Learning approach in Text Document Classification , 2010, ArXiv.

[47]  Yafei Wang,et al.  A new feature selection algorithm in text categorization , 2010, 2010 International Symposium on Computer, Communication, Control and Automation (3CA).

[48]  Fang Lu,et al.  A refined weighted K-Nearest Neighbors algorithm for text categorization , 2010, 2010 IEEE International Conference on Intelligent Systems and Knowledge Engineering.