Text categorization based on regularization extreme learning machine

This article proposes a novel approach for text categorization based on a regularization extreme learning machine (RELM) in which its weights can be obtained analytically, and a bias-variance trade-off could be achieved by adding a regularization term into the linear system of single-hidden layer feedforward neural networks. To fit the input scale of RELM, the latent semantic analysis was used to represent text for dimensionality reduction. Moreover, a classification algorithm based on RELM was developed including the uni-label (i.e., a document can only be assigned to a unique category) and multi-label (i.e., a document can be assigned to multiple categories simultaneously) situations. The experimental results in two benchmarks show that the proposed method can produce good performance in most cases, and it could learn faster than popular methods such as feedforward neural networks or support vector machine.

[1]  George Tambouratzis,et al.  A comparative study on authorship attribution classification tasks using both neural network and statistical methods , 2010, Neural Computing and Applications.

[2]  Dianhui Wang,et al.  Extreme learning machines: a survey , 2011, Int. J. Mach. Learn. Cybern..

[3]  A. E. Hoerl,et al.  Ridge regression: biased estimation for nonorthogonal problems , 2000 .

[4]  Kishan G. Mehrotra,et al.  Efficient classification for multiclass problems using modular neural networks , 1995, IEEE Trans. Neural Networks.

[5]  Fabrizio Sebastiani,et al.  Machine learning in automated text categorization , 2001, CSUR.

[6]  A. Kai Qin,et al.  Evolutionary extreme learning machine , 2005, Pattern Recognit..

[7]  Evgeniy Gabrilovich,et al.  Text categorization with many redundant features: using aggressive feature selection to make SVMs competitive with C4.5 , 2004, ICML.

[8]  Jun Zhou,et al.  Hyperspectral Unmixing via $L_{1/2}$ Sparsity-Constrained Nonnegative Matrix Factorization , 2011, IEEE Transactions on Geoscience and Remote Sensing.

[9]  Zhihong Man,et al.  A New Adaptive Backpropagation Algorithm Based on Lyapunov Stability Theory for Neural Networks , 2006, IEEE Transactions on Neural Networks.

[10]  Anestis Antoniadis,et al.  A sparse version of the ridge logistic regression for large-scale text categorization , 2011, Pattern Recognit. Lett..

[11]  David D. Lewis,et al.  A comparison of two learning algorithms for text categorization , 1994 .

[12]  Alberto Ferreira de Souza,et al.  Automated multi-label text categorization with VG-RAM weightless neural networks , 2009, Neurocomputing.

[13]  Gerard Salton,et al.  Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[14]  Guang-Bin Huang,et al.  Extreme learning machine: a new learning scheme of feedforward neural networks , 2004, 2004 IEEE International Joint Conference on Neural Networks (IEEE Cat. No.04CH37541).

[15]  A Tikhonov,et al.  Solution of Incorrectly Formulated Problems and the Regularization Method , 1963 .

[16]  Hwee Tou Ng,et al.  Feature selection, perceptron learning, and a usability case study for text categorization , 1997, SIGIR '97.

[17]  Zhihua Zhang,et al.  A non-convex relaxation approach to sparse dictionary learning , 2011, CVPR 2011.

[18]  Guy W. Mineau,et al.  A simple KNN algorithm for text categorization , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[19]  Wei Wang,et al.  Text categorization based on combination of modified back propagation neural network and latent semantic analysis , 2009, Neural Computing and Applications.

[20]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[21]  Ismail Hmeidi,et al.  Performance of KNN and SVM classifiers on full word Arabic articles , 2008, Adv. Eng. Informatics.

[22]  Jianping Li,et al.  Parameter selection of support vector machines and genetic algorithm based on change area search , 2011, Neural Computing and Applications.

[23]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[24]  Han Tong Loh,et al.  Comparison of Extreme Learning Machine with Support Vector Machine for Text Classification , 2005, IEA/AIE.

[25]  Minoru Nakayama,et al.  Subject Categorization for Web Educational Resources using MLP , 2003, ESANN.

[26]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[27]  Hongming Zhou,et al.  Extreme Learning Machine for Regression and Multiclass Classification , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[28]  David Madigan,et al.  Large-Scale Bayesian Logistic Regression for Text Categorization , 2007, Technometrics.

[29]  Chee Kheong Siew,et al.  Extreme learning machine: Theory and applications , 2006, Neurocomputing.

[30]  Hongming Zhou,et al.  Optimization method based extreme learning machine for classification , 2010, Neurocomputing.

[31]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.