Text feature extraction based on stacked variational autoencoder

Abstract This paper presents a text feature extraction model based on stacked variational autoencoder (SVAE). A noise reduction mechanism is designed for variational autoencoder in input layer of text feature extraction to reduce noise interference and improve robustness and feature discrimination of the model. Three kinds of deep SVAE network architectures are constructed to improve ability of representing learning to mine feature intension in depth. Experiments are carried out in several aspects, including comparative analysis of text feature extraction model, sparse performance, parameter selection and stacking. Results show that text feature extraction model of SVAE has good performance and effect. The highest accuracy of SVAE models of Fudan and Reuters datasets is 13.50% and 8.96% higher than that of PCA, respectively.

[1]  Ryan Cotterell,et al.  A Structured Variational Autoencoder for Contextual Morphological Inflection , 2018, ACL.

[2]  Petros Drineas,et al.  Feature selection for linear SVM with provable guarantees , 2014, Pattern Recognit..

[3]  Erhardt Barth,et al.  A Hybrid Convolutional Variational Autoencoder for Text Generation , 2017, EMNLP.

[4]  Holger Schwenk,et al.  Continuous space language models , 2007, Comput. Speech Lang..

[5]  Ying Tan,et al.  Variational Autoencoder for Semi-Supervised Text Classification , 2017, AAAI.

[6]  Ole Winther,et al.  Sequential Neural Models with Stochastic Layers , 2016, NIPS.

[7]  Kashif Javed,et al.  A two-stage Markov blanket based feature selection algorithm for text classification , 2015, Neurocomputing.

[8]  Serkan Günal,et al.  The impact of preprocessing on text classification , 2014, Inf. Process. Manag..

[9]  Shaojun Zhong,et al.  A Kind of Text Classification Design on the Basis of Natural Language Processing , 2013 .

[10]  Hiroshi Takahashi,et al.  Variational Autoencoder with Implicit Optimal Priors , 2018, AAAI.

[11]  Kansheng Shi,et al.  Efficient text classification method based on improved term reduction and term weighting , 2011 .

[12]  Piji Li,et al.  Salience Estimation via Variational Auto-Encoders for Multi-Document Summarization , 2017, AAAI.

[13]  Stewart Massie,et al.  Lexicon based feature extraction for emotion text classification , 2017, Pattern Recognit. Lett..

[14]  Yu Gong,et al.  Variational Autoencoders with Jointly Optimized Latent Dependency Structure , 2018, ICLR.

[15]  Philip Bachman,et al.  An Architecture for Deep, Hierarchical Generative Models , 2016, NIPS.