Dual Supervised Learning

Many supervised learning tasks are emerged in dual forms, e.g., English-to-French translation vs. French-to-English translation, speech recognition vs. text to speech, and image classification vs. image generation. Two dual tasks have intrinsic connections with each other due to the probabilistic correlation between their models. This connection is, however, not effectively utilized today, since people usually train the models of two dual tasks separately and independently. In this work, we propose training the models of two dual tasks simultaneously, and explicitly exploiting the probabilistic correlation between them to regularize the training process. For ease of reference, we call the proposed approach \emph{dual supervised learning}. We demonstrate that dual supervised learning can improve the practical performances of both tasks, for various applications including machine translation, image processing, and sentiment analysis.

[1]  Verena Rieser,et al.  The E2E Dataset: New Challenges For End-to-End Generation , 2017, SIGDIAL Conference.

[2]  Christopher Potts,et al.  Learning Word Vectors for Sentiment Analysis , 2011, ACL.

[3]  Jian Zhang,et al.  SQuAD: 100,000+ Questions for Machine Comprehension of Text , 2016, EMNLP.

[4]  Alex Graves,et al.  Conditional Image Generation with PixelCNN Decoders , 2016, NIPS.

[5]  Flemming Topsøe,et al.  Jensen-Shannon divergence and Hilbert space embedding , 2004, International Symposium onInformation Theory, 2004. ISIT 2004. Proceedings..

[6]  Nenghai Yu,et al.  Dual Inference for Machine Learning , 2017, IJCAI.

[7]  Omer Levy,et al.  code2seq: Generating Sequences from Structured Representations of Code , 2018, ICLR.

[8]  Quoc V. Le,et al.  Semi-supervised Sequence Learning , 2015, NIPS.

[9]  Xi Chen,et al.  PixelCNN++: Improving the PixelCNN with Discretized Logistic Mixture Likelihood and Other Modifications , 2017, ICLR.

[10]  Hermann Ney,et al.  LSTM Neural Networks for Language Modeling , 2012, INTERSPEECH.

[11]  Yoshua Bengio,et al.  On Using Very Large Target Vocabulary for Neural Machine Translation , 2014, ACL.

[12]  Graham Neubig,et al.  Retrieval-Based Neural Code Generation , 2018, EMNLP.

[13]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[14]  Peter L. Bartlett,et al.  Rademacher and Gaussian Complexities: Risk Bounds and Structural Results , 2003, J. Mach. Learn. Res..

[15]  Yoshua Bengio,et al.  A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[16]  Matthias Bethge,et al.  A note on the evaluation of generative models , 2015, ICLR.

[17]  Koray Kavukcuoglu,et al.  Pixel Recurrent Neural Networks , 2016, ICML.

[18]  Jian Sun,et al.  Identity Mappings in Deep Residual Networks , 2016, ECCV.

[19]  Heiga Zen,et al.  WaveNet: A Generative Model for Raw Audio , 2016, SSW.

[20]  Rico Sennrich,et al.  Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.

[21]  Matthew D. Zeiler ADADELTA: An Adaptive Learning Rate Method , 2012, ArXiv.

[22]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[23]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[24]  Geoffrey E. Hinton,et al.  Speech recognition with deep recurrent neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[25]  Ameet Talwalkar,et al.  Foundations of Machine Learning , 2012, Adaptive computation and machine learning.

[26]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[27]  Tie-Yan Liu,et al.  Dual Learning for Machine Translation , 2016, NIPS.

[28]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[29]  Jianfeng Gao,et al.  A Human Generated MAchine Reading COmprehension Dataset , 2018 .

[30]  George Kurian,et al.  Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.

[31]  Christopher J. C. Burges,et al.  A Tutorial on Support Vector Machines for Pattern Recognition , 1998, Data Mining and Knowledge Discovery.

[32]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[33]  Wang Ling,et al.  Latent Predictor Networks for Code Generation , 2016, ACL.

[34]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[35]  Andreas Christmann,et al.  Support vector machines , 2008, Data Mining and Knowledge Discovery Handbook.

[36]  Marc Brockschmidt,et al.  Structured Neural Summarization , 2018, ICLR.

[37]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Yi Yang,et al.  WikiQA: A Challenge Dataset for Open-Domain Question Answering , 2015, EMNLP.

[39]  Chong Wang,et al.  Deep Speech 2 : End-to-End Speech Recognition in English and Mandarin , 2015, ICML.

[40]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[41]  Yang Liu,et al.  Minimum Risk Training for Neural Machine Translation , 2015, ACL.

[42]  Lukás Burget,et al.  Recurrent neural network based language model , 2010, INTERSPEECH.

[43]  Razvan Pascanu,et al.  On the difficulty of training recurrent neural networks , 2012, ICML.