A Robust Text Classifier Based on Denoising Deep Neural Network in the Analysis of Big Data

Text classification has always been an interesting issue in the research area of natural language processing (NLP). While entering the era of big data, a good text classifier is critical to achieving NLP for scientific big data analytics. With the ever-increasing size of text data, it has posed important challenges in developing effective algorithm for text classification. Given the success of deep neural network (DNN) in analyzing big data, this article proposes a novel text classifier using DNN, in an effort to improve the computational performance of addressing big text data with hybrid outliers. Specifically, through the use of denoising autoencoder (DAE) and restricted Boltzmann machine (RBM), our proposed method, named denoising deep neural network (DDNN), is able to achieve significant improvement with better performance of antinoise and feature extraction, compared to the traditional text classification algorithms. The simulations on benchmark datasets verify the effectiveness and robustness of our proposed text classifier.

[1]  Yu Tsao,et al.  Speech enhancement based on deep denoising autoencoder , 2013, INTERSPEECH.

[2]  Li-Ping Jing,et al.  Improved feature selection approach TFIDF in text mining , 2002, Proceedings. International Conference on Machine Learning and Cybernetics.

[3]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[4]  Geoffrey E. Hinton Training Products of Experts by Minimizing Contrastive Divergence , 2002, Neural Computation.

[5]  Paul Smolensky,et al.  Information processing in dynamical systems: foundations of harmony theory , 1986 .

[6]  Yiming Yang,et al.  A re-examination of text categorization methods , 1999, SIGIR '99.

[7]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[8]  Nicolas Le Roux,et al.  Representational Power of Restricted Boltzmann Machines and Deep Belief Networks , 2008, Neural Computation.

[9]  Padmini Srinivasan,et al.  Hierarchical neural networks for text categorization , 1999, SIGIR 1999.

[10]  Liu Lei Method of Semantic Relevance Relation Measurement between Words , 2009 .

[11]  Dik Lun Lee,et al.  Feature reduction for neural network based text categorization , 1999, Proceedings. 6th International Conference on Advanced Systems for Advanced Applications.

[12]  Andrew McCallum,et al.  A comparison of event models for naive bayes text classification , 1998, AAAI 1998.

[13]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[14]  Masato Okada,et al.  Dynamical analysis of contrastive divergence learning: Restricted Boltzmann machines with Gaussian visible units , 2016, Neural Networks.

[15]  Xiong Luo,et al.  Towards enhancing stacked extreme learning machine with sparse autoencoder by correntropy , 2017, J. Frankl. Inst..

[16]  Pat Langley,et al.  An Analysis of Bayesian Classifiers , 1992, AAAI.

[17]  Pascal Vincent,et al.  Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion , 2010, J. Mach. Learn. Res..

[18]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[19]  Kenneth E. Barner,et al.  Exploiting Restricted Boltzmann Machines and Deep Belief Networks in Compressed Sensing , 2017, IEEE Transactions on Signal Processing.

[20]  Yoshua Bengio,et al.  Extracting and composing robust features with denoising autoencoders , 2008, ICML '08.

[21]  Leif E. Peterson K-nearest neighbor , 2009, Scholarpedia.

[22]  Jason Weston,et al.  Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..

[23]  Monika Henzinger,et al.  A Comprehensive Study of Features and Algorithms for URL-Based Topic Classification , 2011, TWEB.

[24]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[25]  W. Bruce Croft,et al.  LDA-based document models for ad-hoc retrieval , 2006, SIGIR.

[26]  HenzingerMonika,et al.  A Comprehensive Study of Features and Algorithms for URL-Based Topic Classification , 2011 .

[27]  Sunita Sarawagi,et al.  Scaling multi-class support vector machines using inter-class confusion , 2002, KDD.

[28]  Antonio Maria Rinaldi,et al.  A content-based approach for document representation and retrieval , 2008, DocEng '08.

[29]  Geoffrey E. Hinton Deep belief networks , 2009, Scholarpedia.

[30]  Christian Igel,et al.  An Introduction to Restricted Boltzmann Machines , 2012, CIARP.

[31]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[32]  Derek C. Rose,et al.  Deep Machine Learning - A New Frontier in Artificial Intelligence Research [Research Frontier] , 2010, IEEE Computational Intelligence Magazine.

[33]  Yoshua. Bengio,et al.  Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..

[34]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.