Support vector machine text classification system: Using Ant Colony Optimization based feature subset selection

Feature subset selection (FSS) is an important step for effective text classification systems. In this work, we have implemented a support vector machine (SVM) text classifier for Arabic articles. Moreover, we have implemented a novel FSS method based on Ant Colony Optimization (ACO) and Chi-square statistic. The proposed ACO-Based FSS method adapted Chi-square statistic as heuristic information and the effectiveness of the SVM classifier as a guide to improve the selection of features for each category. Compared to the six state-of-the-art FSS methods, our ACO Based-FSS algorithm achieved better TC effectiveness. Evaluation used an in-house Arabic text classification corpus that consists of 1445 documents independently classified into nine categories. The experimental results were presented in terms of macro-averaging precision, macro-averaging recall and macro-averaging F1 measures.

[1]  Gerard Salton,et al.  Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[2]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[3]  Ron Kohavi,et al.  Feature Selection for Knowledge Discovery and Data Mining , 1998 .

[4]  M. Dorigo,et al.  1 Positive Feedback as a Search Strategy , 1991 .

[5]  Sreeram Ramakrishnan,et al.  A hybrid approach for feature subset selection using neural networks and ant colony optimization , 2007, Expert Syst. Appl..

[6]  Hwee Tou Ng,et al.  Feature selection, perceptron learning, and a usability case study for text categorization , 1997, SIGIR '97.

[7]  Yiming Yang,et al.  A Comparative Study on Feature Selection in Text Categorization , 1997, ICML.

[8]  Ophir Frieder,et al.  IIT at TREC-10 , 2001, TREC.

[9]  Abdelwadood Moh'd. Mesleh Support Vector Machines based Arabic Language Text Classification System: Feature Selection Comparative Study , 2007, SCSS.

[10]  Marie-Francine Moens,et al.  Information Extraction: Algorithms and Prospects in a Retrieval Context , 2006, The Information Retrieval Series.

[11]  Marco Dorigo,et al.  Ant system: optimization by a colony of cooperating agents , 1996, IEEE Trans. Syst. Man Cybern. Part B.

[12]  Günther R. Raidl,et al.  Letting ants labeling point features [sic.: for 'labeling' read 'label'] , 2002, Proceedings of the 2002 Congress on Evolutionary Computation. CEC'02 (Cat. No.02TH8600).

[13]  George Forman,et al.  An Extensive Empirical Study of Feature Selection Metrics for Text Classification , 2003, J. Mach. Learn. Res..

[14]  Thomas Stützle,et al.  Ant Colony Optimization , 2009, EMO.

[15]  Qiang Shen,et al.  Finding Rough Set Reducts with Ant Colony Optimization , 2003 .

[16]  Manuel López-Ibáñez,et al.  Ant colony optimization , 2010, GECCO '10.

[17]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[18]  Fabrizio Sebastiani,et al.  Machine learning in automated text categorization , 2001, CSUR.

[19]  Ahmed Al-Ani,et al.  Feature Subset Selection Using Ant Colony Optimization , 2008 .

[20]  Abdelwadood Mesleh,et al.  Chi Square Feature Extraction Based Svms Arabic Language Text Categorization System , 2007 .

[21]  Huan Liu,et al.  Toward integrating feature selection algorithms for classification and clustering , 2005, IEEE Transactions on Knowledge and Data Engineering.

[22]  Maria Simi,et al.  Experiments on the Use of Feature Selection and Negative Evidence in Automated Text Categorization , 2000, ECDL.

[23]  Gerard Salton,et al.  A vector space model for automatic indexing , 1975, CACM.

[24]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[25]  Michael Schreyer Letting Ants Labeling Point Features , 2002 .

[26]  Donald E. Grierson,et al.  Comparison among five evolutionary-based optimization algorithms , 2005, Adv. Eng. Informatics.

[27]  Ronald L. Rivest,et al.  Training a 3-node neural network is NP-complete , 1988, COLT '88.