Semi-Supervised Seq2seq Joint-Stochastic-Approximation Autoencoders With Applications to Semantic Parsing

Developing Semi-Supervised Seq2Seq (<inline-formula><tex-math notation="LaTeX">$S^4$</tex-math></inline-formula>) learning for sequence transduction tasks in natural language processing (NLP), e.g. semantic parsing, is challenging, since both the input and the output sequences are discrete. This discrete nature makes trouble for methods which need gradients either from the input space or from the output space. Recently, a new learning method called joint stochastic approximation is developed for unsupervised learning of fixed-dimensional autoencoders and theoretically avoids gradient propagation through discrete latent variables, which is suffered by Variational Auto-Encoders (VAEs). In this letter, we propose seq2seq Joint-stochastic-approximation Auto-Encoders (JAEs) and apply them to <inline-formula><tex-math notation="LaTeX">$S^4$</tex-math></inline-formula> learning for NLP sequence transduction tasks. Further, we propose bi-directional JAEs (called bi-JAEs) to leverage not only unpaired input sequences (which is most commonly studied) but also unpaired output sequences. Experiments on two benchmarking datasets for semantic parsing show that JAEs consistently outperform VAEs in <inline-formula><tex-math notation="LaTeX">$S^4$</tex-math></inline-formula> learning and bi-JAEs yield further improvements.

[1]  Razvan Pascanu,et al.  Learning Algorithms for the Classification Restricted Boltzmann Machine , 2012, J. Mach. Learn. Res..

[2]  Tomoki Toda,et al.  Learning to Generate Pseudo-Code from Source Code Using Statistical Machine Translation (T) , 2015, 2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[3]  Ben Poole,et al.  Categorical Reparametrization with Gumble-Softmax , 2017, ICLR 2017.

[4]  Phil Blunsom,et al.  Language as a Latent Variable: Discrete Generative Models for Sentence Compression , 2016, EMNLP.

[5]  Wojciech Zaremba,et al.  Improved Techniques for Training GANs , 2016, NIPS.

[6]  Shin Ishii,et al.  Virtual Adversarial Training: A Regularization Method for Supervised and Semi-Supervised Learning , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Bin Wang,et al.  Learning Trans-Dimensional Random Fields with Applications to Language Modeling , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Yee Whye Teh,et al.  The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables , 2016, ICLR.

[9]  Luke S. Zettlemoyer,et al.  Online Learning of Relaxed CCG Grammars for Parsing to Logical Form , 2007, EMNLP.

[10]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[11]  Zhijian Ou,et al.  Joint Stochastic Approximation learning of Helmholtz Machines , 2016, ArXiv.

[12]  Chris Dyer,et al.  Semantic Parsing with Semi-Supervised Sequential Autoencoders , 2016, EMNLP.

[13]  Jacob Andreas,et al.  Semantic Parsing as Machine Translation , 2013, ACL.

[14]  Graham Neubig,et al.  StructVAE: Tree-structured Latent Variable Models for Semi-supervised Semantic Parsing , 2018, ACL.

[15]  Zhiting Hu,et al.  Improved Variational Autoencoders for Text Modeling using Dilated Convolutions , 2017, ICML.

[16]  H. Robbins A Stochastic Approximation Method , 1951 .

[17]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[18]  Max Welling,et al.  Semi-supervised Learning with Deep Generative Models , 2014, NIPS.

[19]  Tie-Yan Liu,et al.  Dual Learning for Machine Translation , 2016, NIPS.