Feature Selection for Effective Text Classification using Semantic Information

Text categorization is the task of assigning text or documents into pre-specified classes or categories. For an improved classification of documents text-based learning needs to understand the context, like humans can decide the relevance of a text through the context associated with it, thus it is required to incorporate the context information with the text in machine learning for better classification accuracy. This can be achieved by using semantic information like part-of-speech tagging associated with the text. Thus the aim of this experimentation is to utilize this semantic information to select features which may provide better classification results. Different datasets are constructed with each different collection of features to gain an understanding about what is the best representation for text data depending on different types of classifiers. General Terms Text Classification

[1]  Mostafa Keikha,et al.  Rich document representation and classification: An analysis , 2009, Knowl. Based Syst..

[2]  Kang Chen,et al.  Chinese Text Classification Based on Summarization Technique , 2007, Third International Conference on Semantics, Knowledge and Grid (SKG 2007).

[3]  Sotiris Kotsiantis,et al.  Text Classification Using Machine Learning Techniques , 2005 .

[4]  David D. Lewis,et al.  Reuters-21578 Text Categorization Test Collection, Distribution 1.0 , 1997 .

[5]  Divakar Singh,et al.  A SURVEY REPORT ON TEXT CLASSIFICATION WITH DIFFERENT TERM WEIGHING METHODS AND COMPARISON BETWEEN CLASSIFICATION ALGORITHMS , 2013 .

[6]  Tiejun Zhao,et al.  A Fusion of Multiple Classifiers Approach Based on Reliability function for Text Categorization , 2008, 2008 Fifth International Conference on Fuzzy Systems and Knowledge Discovery.

[7]  Bernardete Ribeiro,et al.  RVM Ensemble for Text Classification , 2007 .

[8]  Yi Lu Murphey,et al.  Incremental Learning for Text Document Classification , 2007, 2007 International Joint Conference on Neural Networks.

[9]  Shengyi Jiang,et al.  An improved K-nearest-neighbor algorithm for text categorization , 2012, Expert Syst. Appl..

[10]  Cornelis H. A. Koster,et al.  Four text classification algorithms compared on a Dutch corpus , 1998, SIGIR '98.

[11]  D. S. Guru,et al.  Representation and Classification of Text Documents: A Brief Review , 2010 .

[12]  Levent Özgür,et al.  Text Categorization with Class-Based and Corpus-Based Keyword Selection , 2005, ISCIS.

[13]  A. Marimuthu,et al.  Text document pre-processing with the KNN for classification using the SVM , 2013, 2013 7th International Conference on Intelligent Systems and Control (ISCO).

[14]  J. Farkas Improving the classification accuracy of automatic text processing systems using context vectors and back-propagation algorithms , 1996, Proceedings of 1996 Canadian Conference on Electrical and Computer Engineering.

[15]  Xiao-yu Jiang,et al.  Improving the Performance of Text Categorization Using Automatic Summarization , 2009, 2009 International Conference on Computer Modeling and Simulation.

[16]  Houda Benbrahim,et al.  A Text Classification based method for context extraction from online reviews , 2013, 2013 8th International Conference on Intelligent Systems: Theories and Applications (SITA).

[17]  Kerem Celik,et al.  A comprehensive analysis of using semantic information in text categorization , 2013, 2013 IEEE INISTA.

[18]  Yan Xu A Study for Important Criteria of Feature Selection in Text Categorization , 2010, 2010 2nd International Workshop on Intelligent Systems and Applications.

[19]  V Korde,et al.  TEXT CLASSIFICATION AND CLASSIFIERS: A SURVEY , 2012 .

[20]  C. Roussey,et al.  Feature vector construction combining structure and content for document classification , 2012, 2012 6th International Conference on Sciences of Electronics, Technologies of Information and Telecommunications (SETIT).

[21]  S Suganya.,et al.  Syntax and Semantics based Efficient Text Classification Framework , 2013 .

[22]  Georgios Paliouras,et al.  Representation models for text classification: a comparative analysis over three web document types , 2012, WIMS '12.

[23]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[24]  Zhongyang Xiong,et al.  Fast text categorization using concise semantic analysis , 2011, Pattern Recognit. Lett..

[25]  Yiming Yang,et al.  Text categorization , 2008, Scholarpedia.

[26]  Qiang Yang,et al.  Text classification improved through multigram models , 2006, CIKM '06.

[27]  Fabrizio Sebastiani,et al.  Machine learning in automated text categorization , 2001, CSUR.

[28]  Kang Chen,et al.  Chinese Text Classification Based on Summarization Technique , 2007 .

[29]  Osmar R. Zaïane,et al.  Text document categorization by term association , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[30]  S. Niharika,et al.  A SURVEY ON TEXT CATEGORIZATION , 2012 .

[31]  Ziqiang Wang,et al.  Text Categorization Based on LDA and SVM , 2008, 2008 International Conference on Computer Science and Software Engineering.

[32]  Anil K. Jain,et al.  Classification of text documents , 1998, Proceedings. Fourteenth International Conference on Pattern Recognition (Cat. No.98EX170).

[33]  V. Tokekar,et al.  Identifying context of text documents using Naïve Bayes classification and Apriori association rule mining , 2012, 2012 CSI Sixth International Conference on Software Engineering (CONSEG).

[34]  W. Bruce Croft,et al.  Combining classifiers in text categorization , 1996, SIGIR '96.

[35]  Thorsten Joachims,et al.  Text categorization with support vector machines , 1999 .

[36]  Khairullah Khan,et al.  A Review of Machine Learning Algorithms for Text-Documents Classification , 2010 .