Text classification based on deep belief network and softmax regression

In this paper, we propose a novel hybrid text classification model based on deep belief network and softmax regression. To solve the sparse high-dimensional matrix computation problem of texts data, a deep belief network is introduced. After the feature extraction with DBN, softmax regression is employed to classify the text in the learned feature space. In pre-training procedures, the deep belief network and softmax regression are first trained, respectively. Then, in the fine-tuning stage, they are transformed into a coherent whole and the system parameters are optimized with Limited-memory Broyden–Fletcher–Goldfarb–Shanno algorithm. The experimental results on Reuters-21,578 and 20-Newsgroup corpus show that the proposed model can converge at fine-tuning stage and perform significantly better than the classical algorithms, such as SVM and KNN.

[1]  Meng Joo Er,et al.  Face recognition with radial basis function (RBF) neural networks , 2002, IEEE Trans. Neural Networks.

[2]  Qingcai Chen,et al.  Active Semi-Supervised Learning Method with Hybrid Deep Belief Networks , 2014, PloS one.

[3]  Bin Gu,et al.  A Robust Regularization Path Algorithm for $\nu $ -Support Vector Classification , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[4]  Geoffrey E. Hinton,et al.  Learning and relearning in Boltzmann machines , 1986 .

[5]  James L. McClelland,et al.  Parallel distributed processing: explorations in the microstructure of cognition, vol. 1: foundations , 1986 .

[6]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[7]  Yiming Yang,et al.  RCV1: A New Benchmark Collection for Text Categorization Research , 2004, J. Mach. Learn. Res..

[8]  N. Altman An Introduction to Kernel and Nearest-Neighbor Nonparametric Regression , 1992 .

[9]  Xiaoli Li,et al.  A refinement approach to handling model misfit in text categorization , 2002, KDD.

[10]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[11]  Fabrizio Sebastiani,et al.  Machine learning in automated text categorization , 2001, CSUR.

[12]  Geoffrey E. Hinton,et al.  Application of Deep Belief Networks for Natural Language Understanding , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[13]  Meng Joo Er,et al.  High-speed face recognition based on discrete cosine transform and RBF neural networks , 2005, IEEE Transactions on Neural Networks.

[14]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[15]  Xiao Li,et al.  Machine Learning Paradigms for Speech Recognition: An Overview , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[16]  Fabrizio Sebastiani,et al.  An Analysis of the Relative Hardness of Reuters-21578 Subsets , 2003 .

[17]  Yoshua Bengio,et al.  Exploring Strategies for Training Deep Neural Networks , 2009, J. Mach. Learn. Res..

[18]  Geoffrey E. Hinton,et al.  Discovering Binary Codes for Documents by Learning Deep Generative Models , 2011, Top. Cogn. Sci..

[19]  George Forman,et al.  Learning from Little: Comparison of Classifiers Given Little Training , 2004, PKDD.

[20]  Tao Liu,et al.  A Novel Text Classification Approach Based on Deep Belief Network , 2010, ICONIP.

[21]  Meng Joo Er,et al.  Face recognition using radial basis function (RBF) neural networks , 1999, Proceedings of the 38th IEEE Conference on Decision and Control (Cat. No.99CH36304).

[22]  Meng Joo Er,et al.  A fast approach for automatic generation of fuzzy rules by generalized dynamic fuzzy neural networks , 2001, IEEE Trans. Fuzzy Syst..

[23]  Wei Wang,et al.  Text categorization based on combination of modified back propagation neural network and latent semantic analysis , 2009, Neural Computing and Applications.

[24]  Marc'Aurelio Ranzato,et al.  Semi-supervised learning of compact document representations with deep networks , 2008, ICML '08.

[25]  Thorsten Joachims,et al.  Making large-scale support vector machine learning practical , 1999 .

[26]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[27]  Geoffrey E. Hinton A Practical Guide to Training Restricted Boltzmann Machines , 2012, Neural Networks: Tricks of the Trade.

[28]  Ruslan Salakhutdinov,et al.  Learning Deep Generative Models , 2009 .

[29]  Huijuan Lu,et al.  Text categorization based on regularization extreme learning machine , 2011, Neural Computing and Applications.

[30]  Geoffrey E. Hinton Training Products of Experts by Minimizing Contrastive Divergence , 2002, Neural Computation.

[31]  Paul Smolensky,et al.  Information processing in dynamical systems: foundations of harmony theory , 1986 .

[32]  Bo Xu,et al.  An Efficient Approximation Algorithm for Aircraft Arrival Sequencing and Scheduling Problem , 2014 .

[33]  Fabrizio Sebastiani,et al.  An analysis of the relative hardness of Reuters-21578 subsets: Research Articles , 2005 .

[34]  Maurizio Marchese,et al.  Text Clustering with Seeds Affinity Propagation , 2011, IEEE Transactions on Knowledge and Data Engineering.

[35]  Meng Joo Er,et al.  Illumination compensation and normalization for robust face recognition using discrete cosine transform in logarithm domain , 2006, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[36]  Xingming Sun,et al.  Structural Minimax Probability Machine , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[37]  Yiming Yang,et al.  An Evaluation of Statistical Approaches to Text Categorization , 1999, Information Retrieval.

[38]  Moustafa Ghanem,et al.  Using dragpushing to refine centroid text classifiers , 2005, SIGIR '05.

[39]  Chenchen Huang,et al.  A Research of Speech Emotion Recognition Based on Deep Belief Network and SVM , 2014 .

[40]  Dong Yu,et al.  Large vocabulary continuous speech recognition with context-dependent DBN-HMMS , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[41]  Hynek Hermansky,et al.  Sparse Multilayer Perceptron for Phoneme Recognition , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[42]  Bin Gu,et al.  Incremental Support Vector Learning for Ordinal Regression , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[43]  Meng Joo Er,et al.  Dynamic fuzzy neural networks-a novel approach to function approximation , 2000, IEEE Trans. Syst. Man Cybern. Part B.

[44]  Cyril W. Cleverdon,et al.  The significance of the Cranfield tests on index languages , 1991, SIGIR '91.

[45]  Dong Yu,et al.  Language recognition using deep-structured conditional random fields , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.