TEXT CLASSIFICATIONANDCLASSIFIERS : A SURVEY Vandana

As most information (over 80%) is stored as text, text mining is believed to have a high commercial potential value. knowledge may be discovered from many sources of information; yet, unstructured texts remain the largest readily available source of knowledge .Text classification which classifies the documents according to predefined categories .In this paper we are tried to give the introduction of text classification, process of text classification as well as the overview of the classifiers and tried to compare the some existing classifier on basis of few criteria like time complexity, principal and performance.

[1]  Worapoj Kreesuradej,et al.  A new association rule-based text classifier algorithm , 2005, 17th IEEE International Conference on Tools with Artificial Intelligence (ICTAI'05).

[2]  Hae-Chang Rim,et al.  Some Effective Techniques for Naive Bayes Text Classification , 2006, IEEE Transactions on Knowledge and Data Engineering.

[3]  Yaxin Bi,et al.  KNN Model-Based Approach in Classification , 2003, OTM.

[4]  Yiming Yang,et al.  A re-examination of text categorization methods , 1999, SIGIR '99.

[5]  Shi Yong-feng,et al.  Comparison of text categorization algorithms , 2004, Wuhan University Journal of Natural Sciences.

[6]  David D. Lewis,et al.  Naive (Bayes) at Forty: The Independence Assumption in Information Retrieval , 1998, ECML.

[7]  Ido Dagan,et al.  Mistake-Driven Learning in Text Categorization , 1997, EMNLP.

[8]  Hwee Tou Ng,et al.  Feature selection, perceptron learning, and a usability case study for text categorization , 1997, SIGIR '97.

[9]  Jorma Rissanen,et al.  SLIQ: A Fast Scalable Classifier for Data Mining , 1996, EDBT.

[10]  Behzad Moshiri,et al.  Improve text classification accuracy based on classifier fusion methods , 2007, 2007 10th International Conference on Information Fusion.

[11]  Yiming Yang,et al.  An example-based mapping method for text categorization and retrieval , 1994, TOIS.

[12]  Amy J. C. Trappey,et al.  Development of a patent document classification and search platform using a back-propagation network , 2006, Expert Syst. Appl..

[13]  D. S. Guru,et al.  Representation and Classification of Text Documents: A Brief Review , 2010 .

[14]  Eui-Hong,et al.  Centroid-Based Document Classifica tion : Analysis & Exper imental Results ∗ , 2000 .

[15]  Anil K. Jain,et al.  Classification of text documents , 1998, Proceedings. Fourteenth International Conference on Pattern Recognition (Cat. No.98EX170).

[16]  Michael J. Pazzani,et al.  Searching for Dependencies in Bayesian Classifiers , 1995, AISTATS.

[17]  K. A. Vidhya,et al.  A Survey of Naïve Bayes Machine Learning approach in Text Document Classification , 2010, ArXiv.

[18]  Khairullah Khan,et al.  A Review of Machine Learning Algorithms for Text-Documents Classification , 2010 .

[19]  Susan T. Dumais,et al.  Inductive learning algorithms and representations for text categorization , 1998, CIKM '98.

[20]  Songbo Tan,et al.  An improved centroid classifier for text categorization , 2008, Expert Syst. Appl..

[21]  Anirban Dasgupta,et al.  Feature selection methods for text classification , 2007, KDD '07.

[22]  W. Bruce Croft,et al.  Combining classifiers in text categorization , 1996, SIGIR '96.

[23]  Chih-Hung Wu,et al.  Behavior-based spam detection using a hybrid method of rule-based techniques and neural networks , 2009, Expert Syst. Appl..

[24]  Yu-ping Qin,et al.  Study on Multi-label Text Classification Based on SVM , 2009, 2009 Sixth International Conference on Fuzzy Systems and Knowledge Discovery.

[25]  Sholom M. Weiss,et al.  Rule-based Machine Learning Methods for Functional Prediction , 1995, J. Artif. Intell. Res..

[26]  Sholom M. Weiss,et al.  Automated learning of decision rules for text categorization , 1994, TOIS.

[27]  David D. Lewis,et al.  A comparison of two learning algorithms for text categorization , 1994 .

[28]  Padmini Srinivasan,et al.  Automatic Text Categorization Using Neural Networks , 1997 .

[29]  Michael J. Pazzani,et al.  Learning and Revising User Profiles: The Identification of Interesting Web Sites , 1997, Machine Learning.

[30]  Peerapon Vateekul,et al.  Fast Induction of Multiple Decision Trees in Text Categorization from Large Scale, Imbalanced, and Multi-label Data , 2009, 2009 IEEE International Conference on Data Mining Workshops.

[31]  K. Ohe,et al.  Patient Status Classification by using Rule based Sentence Extraction and BM 25-kNN based Classifier , 2006 .

[32]  Ajith Abraham,et al.  Improving kNN Text Categorization by Removing Outliers from Training Set , 2006, CICLing.

[33]  Cheng Hua Li,et al.  An efficient document classification model using an improved back propagation neural network and singular value decomposition , 2009, Expert Syst. Appl..

[34]  Dino Isa,et al.  Text Document Pre-Processing Using the Bayes Formula for Classification Based on the Vector Space Model , 2008, Comput. Inf. Sci..

[35]  Fabrizio Sebastiani,et al.  Machine learning in automated text categorization , 2001, CSUR.

[36]  Fang Lu,et al.  A refined weighted K-Nearest Neighbors algorithm for text categorization , 2010, 2010 IEEE International Conference on Intelligent Systems and Knowledge Engineering.

[37]  Muhammed Miah Improved k-NN Algorithm for Text Classification , 2009, DMIN.

[38]  S. M. Kamruzzaman,et al.  Text Categorization using Association Rule and Naive Bayes Classifier , 2010, ArXiv.

[39]  Chen Donghui,et al.  A new text categorization method based on HMM and SVM , 2010, 2010 2nd International Conference on Computer Engineering and Technology.

[40]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[41]  Yiming Yang,et al.  An Evaluation of Statistical Approaches to Text Categorization , 1999, Information Retrieval.

[42]  David E. Johnson,et al.  Maximizing Text-Mining Performance , 1999 .

[43]  Yirong Shen,et al.  Improving the Performance of Naive Bayes for Text Classification , 2003 .

[44]  Cornelis H. A. Koster,et al.  Four text classification algorithms compared on a Dutch corpus , 1998, SIGIR '98.

[45]  Yiming Yang,et al.  Text categorization , 2008, Scholarpedia.

[46]  Yiming Yang,et al.  A Linear Least Squares Fit Mapping Method for Information Retrieval From Natural Language Texts , 1992, COLING.

[47]  Jung-Hyun Lee,et al.  Feature Selection Using Association Word Mining for Classification , 2001, DEXA.

[48]  Hao Chen,et al.  The application of decision tree in Chinese email classification , 2010, 2010 International Conference on Machine Learning and Cybernetics.

[49]  Tong Zhang,et al.  A decision-tree-based symbolic rule induction system for text categorization , 2002, IBM Syst. J..

[50]  Andrew McCallum,et al.  A comparison of event models for naive bayes text classification , 1998, AAAI 1998.

[51]  Osmar R. Zaïane,et al.  Text document categorization by term association , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..