Sequence to Sequence Learning with Neural Networks
Deep Neural Networks (DNNs) are powerful models that have achieved excellent performance on difficult learning tasks. Although DNNs work well whenever large labeled training sets are available, they cannot be used to map sequences to sequences. In this paper, we present a general end-to-end approach to sequence learning that makes minimal assumptions on the sequence structure. Our method uses a multilayered Long Short-Term Memory (LSTM) to map the input sequence to a vector of a fixed dimensionality, and then another deep LSTM to decode the target sequence from the vector. Our main result is that on an English to French translation task from the WMT-14 dataset, the translations produced by the LSTM achieve a BLEU score of 34.8 on the entire test set, where the LSTM's BLEU score was penalized on out-of-vocabulary words. Additionally, the LSTM did not have difficulty on long sentences. For comparison, a phrase-based SMT system achieves a BLEU score of 33.3 on the same dataset. When we used the LSTM to rerank the 1000 hypotheses produced by the aforementioned SMT system, its BLEU score increases to 36.5, which is close to the previous state of the art. The LSTM also learned sensible phrase and sentence representations that are sensitive to word order and are relatively invariant to the active and the passive voice. Finally, we found that reversing the order of the words in all source sentences (but not target sentences) improved the LSTM's performance markedly, because doing so introduced many short term dependencies between the source and the target sentence which made the optimization problem easier.
The Limitations of Deep Learning in Adversarial Settings
Deep learning takes advantage of large datasets and computationally efficient training algorithms to outperform other approaches at various machine learning tasks. However, imperfections in the training phase of deep neural networks make them vulnerable to adversarial samples: inputs crafted by adversaries with the intent of causing deep neural networks to misclassify. In this work, we formalize the space of adversaries against deep neural networks (DNNs) and introduce a novel class of algorithms to craft adversarial samples based on a precise understanding of the mapping between inputs and outputs of DNNs. In an application to computer vision, we show that our algorithms can reliably produce samples correctly classified by human subjects but misclassified in specific targets by a DNN with a 97% adversarial success rate while only modifying on average 4.02% of the input features per sample. We then evaluate the vulnerability of different sample classes to adversarial perturbations by defining a hardness measure. Finally, we describe preliminary work outlining defenses against adversarial samples by defining a predictive measure of distance between a benign input and a target classification.
neural network machine learning artificial neural network deep learning convolutional neural network convolutional neural natural language deep neural network speech recognition social media neural network model hidden markov model markov model deep neural medical image computer vision object detection image classification conceptual design generative adversarial network gaussian mixture model facial expression generative adversarial deep convolutional neural deep reinforcement learning network architecture adversarial network mutual information deep learning model speech recognition system deep convolutional cad system image denoising speech enhancement neural network architecture convolutional network facial expression recognition feedforward neural network expression recognition nash equilibrium domain adaptation single image loss function based on deep neural net deep learning method semi-supervised learning deep learning algorithm data augmentation neural networks based image super-resolution deep belief network deep network feature learning enhancement based image synthesi multilayer neural network unsupervised domain adaptation learning task latent space single image super-resolution conditional generative adversarial media service neural networks trained acoustic modeling theoretic analysi speech enhancement based conditional generative multi-layer neural network quantitative structure-activity relationship conversational speech information bottleneck generative adversarial net training deep neural noisy label training deep adversarial perturbation adversarial net generative network batch normalization convolutional generative adversarial social media service deep convolutional generative update rule adversarial neural network deep neural net sensing mri convolutional generative adversarial sample wasserstein gan machine-learning algorithm robust training ventral stream binary weight gan training train deep neural ventral visual pathway deep generative adversarial current speech recognition pre-trained deep neural analysi of tweets deep feedforward neural improving deep learning frechet inception distance training generative adversarial stimulus feature medical image synthesi training generative community intelligence acoustic input overcoming catastrophic forgetting social reporting networks reveal context-dependent deep neural deep compression ventral pathway weights and activation extremely noisy