Improving Arabic Text Categorization Using Neural Network with SVD

In this paper, we present a model based on the Neural Network (NN) for classifying Arabic texts. We propose the use of Singular Value Decomposition (SVD) as a preproces- sor of NN to reduce the data in terms of both size as well as dimensionality so that the input data become more classifiable and faster for the convergence of the training process used in the NN model. To test the effectiveness of the proposed model, experiments were conducted using an in-house collected Arabic corpus for text categorization. The results showed that the proposed model was able to achieve high categorization effectiveness as measured by precision, recall and F-measure. Experimental result shows that the ANN model using SVD is better than the basic ANN on Arabic text classification.

[1]  K. Rajan,et al.  Automatic classification of Tamil documents using vector space model and artificial neural network , 2009, Expert Syst. Appl..

[2]  A. Selamat,et al.  Neural networks for web page classification based on augmented PCA , 2003, Proceedings of the International Joint Conference on Neural Networks, 2003..

[3]  Bernhard Pfahringer,et al.  Text Categorisation Using Document Profiling , 2003, PKDD.

[4]  Simon Haykin,et al.  Neural Networks: A Comprehensive Foundation , 1998 .

[5]  Eric R. Ziegel,et al.  Understanding Neural Networks , 1980 .

[6]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[7]  Guy Pujolle,et al.  An Intelligent IN , 1992, Int. J. Netw. Manag..

[8]  Riyad Al-Shalabi,et al.  A Computational Morphology System for Arabic , 1998, SEMITIC@COLING.

[9]  Song Han-tao,et al.  Feature Selection in Text Categorization , 2004 .

[10]  Yiming Yang,et al.  High-performing feature selection for text classification , 2002, CIKM '02.

[11]  Chris Ding,et al.  On the Use of Singular Value Decomposition for Text Retrieval , 2000 .

[12]  Mohamed S. Abdel-Wahab,et al.  An Intelligent System For Arabic Text Categorization , 2006 .

[13]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[14]  Marcin Peski Categorization , 2006, Encyclopedic Dictionary of Archaeology.

[15]  C. Roberts,et al.  Foundation , 2000, The Fairchild Books Dictionary of Fashion.

[16]  Leon D. Segal,et al.  Functions , 1995 .

[17]  Fredric C. Gey,et al.  Building an Arabic Stemmer for Information Retrieval , 2002, TREC.

[18]  Janvier Nzeutchap Finding polynomials to count lattice points ; Computer explorations with MuPAD-Combinat , 2006 .

[19]  Yiming Yang,et al.  A Study of Approaches to Hypertext Categorization , 2002, Journal of Intelligent Information Systems.

[20]  Martin T. Hagan,et al.  Neural network design , 1995 .

[21]  河島 正光 ことばの泉 統計分類(statistical classification) , 1974 .

[22]  Alaa M. El-Halees,et al.  Arabic Text Classification Using Maximum Entropy , 2015 .

[23]  Abdelwadood Mesleh,et al.  Chi Square Feature Extraction Based Svms Arabic Language Text Categorization System , 2007 .

[24]  Fabrizio Sebastiani,et al.  Machine learning in automated text categorization , 2001, CSUR.

[25]  Yiming Yang,et al.  A re-examination of text categorization methods , 1999, SIGIR '99.

[26]  SaltonGerard,et al.  Term-weighting approaches in automatic text retrieval , 1988 .

[27]  Hwee Tou Ng,et al.  Feature selection, perceptron learning, and a usability case study for text categorization , 1997, SIGIR '97.

[28]  Dik Lun Lee,et al.  Feature reduction for neural network based text categorization , 1999, Proceedings. 6th International Conference on Advanced Systems for Advanced Applications.

[29]  Jessica Lowell Neural Network , 2001 .

[30]  Amine Bensaid,et al.  Automatic Arabic Document Categorization Based on the Naïve Bayes Algorithm , 2004 .

[31]  Robert I. Damper,et al.  Comparison of multilayer and radial basis function neural networks for text-dependent speaker recognition , 1996, Proceedings of International Conference on Neural Networks (ICNN'96).

[32]  Evon M. O. Abu-Taieh,et al.  Comparative Study , 2020, Definitions.

[33]  Ricco Rakotomalala,et al.  Combining feature selection and feature reduction for protein classification , 2006 .

[34]  Wei-Ying Ma,et al.  An Evaluation on Feature Selection for Text Clustering , 2003, ICML.

[35]  D. Ruppert The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .

[36]  Abdulmohsen Al-Thubaity,et al.  Automatic Arabic Text Classification , 2008 .

[37]  Werner Dubitzky,et al.  A Practical Approach to Microarray Data Analysis , 2003, Springer US.

[38]  Narasimhan Sundararajan,et al.  Communication channel equalization using complex-valued minimal radial basis function neural networks , 2002, IEEE Trans. Neural Networks.

[39]  Lisa Ballesteros,et al.  Improving stemming for Arabic information retrieval: light stemming and co-occurrence analysis , 2002, SIGIR '02.

[40]  Ophir Frieder,et al.  On arabic search: improving the retrieval effectiveness via a light stemming approach , 2002, CIKM '02.

[41]  Rehab Duwairi A Distance-based Classifier for Arabic Text Categorization , 2005, DMIN.

[42]  Norbert Jankowski,et al.  Survey of Neural Transfer Functions , 1999 .

[43]  Luis Mateus Rocha,et al.  Singular value decomposition and principal component analysis , 2003 .

[44]  Yiming Yang,et al.  A Comparative Study on Feature Selection in Text Categorization , 1997, ICML.

[45]  Bo Yu,et al.  Latent semantic analysis for text categorization using neural network , 2008, Knowl. Based Syst..