A Survey on Text Classification with Different Types of Classification Methods

Text classification approach gaining more importance because of the accessibility of large number of electronic documents from a variety of resource. Text categorization (Also called Text Categorization) is the task of assigning predefined categories to documents. It is the method of finding interesting regularities in large textual, where interesting means non trivial, hidden, previously unknown and potentially useful. The goal of text mining is to enable users to extract information from textual resource and deals with operation such as retrieval, classification, clustering, data mining, natural language preprocessing and machine learning techniques together to classify different pattern. A major characteristic or difficulty of text categorization is high dimensionality of feature space. The reduction of dimensionality by selecting new attributes which is subset of old attributes is known as feature selection. Featureselection methods are discussed in this paper for reducing the dimensionality of the dataset by removing features that are considered irrelevant for the classification. This paper surveys of text classification, several approaches of text classification, feature selection methods and applications of text classification.

[1]  Tong Zhang,et al.  A decision-tree-based symbolic rule induction system for text categorization , 2002, IBM Syst. J..

[2]  Thorsten Joachims,et al.  Transductive Inference for Text Classification using Support Vector Machines , 1999, ICML.

[3]  Gianni Amati,et al.  A Framework for Filtering News and Managing Distributed Data , 1997, J. Univers. Comput. Sci..

[4]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[5]  Fernando Moura-Pires,et al.  Feature Selection Algorithms to Improve Documents' Classification Performance , 2003, AWIC.

[6]  Guy W. Mineau,et al.  Feature Selection Strategies for Text Categorization , 2003, Canadian Conference on AI.

[7]  Marko Grobelnik,et al.  Interaction of Feature Selection Methods and Linear Classification Models , 2002 .

[8]  Jianhui Luo,et al.  Experiments on Supervised Learning Algorithms for Text Categorization , 2005, 2005 IEEE Aerospace Conference.

[9]  Yiming Yang,et al.  An Evaluation of Statistical Approaches to Text Categorization , 1999, Information Retrieval.

[10]  Kari Torkkola,et al.  Discriminative features for text document classification , 2003, Formal Pattern Analysis & Applications.

[11]  S. Niharika,et al.  A SURVEY ON TEXT CATEGORIZATION , 2012 .

[12]  Naveen Aggarwal,et al.  CLASSIFICATION TECHNIQUES ANALYSIS , 2010 .

[13]  David E. Johnson,et al.  Maximizing Text-Mining Performance , 1999 .

[14]  Alexander Dekhtyar,et al.  Information Retrieval , 2018, Lecture Notes in Computer Science.

[15]  Harris Drucker,et al.  Support vector machines for spam categorization , 1999, IEEE Trans. Neural Networks.

[16]  Gurpreet Singh Lehal,et al.  A Survey of Text Mining Techniques and Applications , 2009 .

[17]  W. John Wilbur,et al.  The automatic identification of stop words , 1992, J. Inf. Sci..

[18]  Sang-Bum Kim,et al.  Effective Methods for Improving Naive Bayes Text Classifiers , 2002, PRICAI.

[19]  Fabrizio Sebastiani,et al.  Machine learning in automated text categorization , 2001, CSUR.

[20]  Kenneth Ward Church,et al.  Word Association Norms, Mutual Information, and Lexicography , 1989, ACL.

[21]  David L. Waltz,et al.  Classifying news stories using memory based reasoning , 1992, SIGIR '92.

[22]  Petra Perner,et al.  Data Mining - Concepts and Techniques , 2002, Künstliche Intell..

[23]  Mieczyslaw A. Klopotek,et al.  Very Large Bayesian Networks in Text Classification , 2003, International Conference on Computational Science.