论文信息 - Learning to Remember Rare Events - 字舞流文

Learning to Remember Rare Events

Despite recent advances, memory-augmented deep neural networks are still limited when it comes to life-long and one-shot learning, especially in remembering rare events. We present a large-scale life-long memory module for use in deep learning. The module exploits fast nearest-neighbor algorithms for efficiency and thus scales to large memory sizes. Except for the nearest-neighbor query, the module is fully differentiable and trained end-to-end with no extra supervision. It operates in a life-long manner, i.e., without the need to reset it during training. Our memory module can be easily added to any part of a supervised neural network. To show its versatility we add it to a number of networks, from simple convolutional ones tested on image classification to deep sequence-to-sequence and recurrent-convolutional models. In all cases, the enhanced network gains the ability to remember and do life-long one-shot learning. Our module remembers training examples shown many thousands of steps in the past and it can successfully generalize from them. We set new state-of-the-art for one-shot learning on the Omniglot dataset and demonstrate, for the first time, life-long one-shot learning in recurrent neural networks on a large-scale machine translation task.

Aurko Roy | Samy Bengio | Lukasz Kaiser | Ofir Nachum | Aurko Roy | Lukasz Kaiser | Samy Bengio | Ofir Nachum

[1] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.

[2] Piotr Indyk,et al. Approximate nearest neighbors: towards removing the curse of dimensionality , 1998, STOC '98.

[3] Kilian Q. Weinberger,et al. Distance Metric Learning for Large Margin Nearest Neighbor Classification , 2005, NIPS.

[4] Pietro Perona,et al. One-shot learning of object categories , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5] Alexandr Andoni,et al. Near-Optimal Hashing Algorithms for Approximate Nearest Neighbor in High Dimensions , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).

[6] Joshua B. Tenenbaum,et al. One shot learning of simple visual concepts , 2011, CogSci.

[7] Jason Weston,et al. WSABIE: Scaling Up to Large Vocabulary Image Annotation , 2011, IJCAI.

[8] Tara N. Sainath,et al. FUNDAMENTAL TECHNOLOGIES IN MODERN SPEECH RECOGNITION Digital Object Identifier 10.1109/MSP.2012.2205597 , 2012 .

[9] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[10] Yoshua Bengio,et al. Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[11] Quoc V. Le,et al. Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[12] Yoshua Bengio,et al. Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.

[13] James Philbin,et al. FaceNet: A unified embedding for face recognition and clustering , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14] Jason Weston,et al. Large-scale Simple Question Answering with Memory Networks , 2015, ArXiv.

[15] Jason Weston,et al. Memory Networks , 2014, ICLR.

[16] Jason Weston,et al. Weakly Supervised Memory Networks , 2015, ArXiv.

[17] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[18] Wojciech Zaremba,et al. Reinforcement Learning Neural Turing Machines , 2015, ArXiv.

[19] Gregory R. Koch,et al. Siamese Neural Networks for One-Shot Image Recognition , 2015 .

[20] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[21] Jason Weston,et al. The Goldilocks Principle: Reading Children's Books with Explicit Memory Representations , 2015, ICLR.

[22] Samy Bengio,et al. Can Active Memory Replace Attention? , 2016, NIPS.

[23] Yoshua Bengio,et al. Dynamic Neural Turing Machine with Soft and Hard Addressing Schemes , 2016, ArXiv.

[24] Oriol Vinyals,et al. Matching Networks for One Shot Learning , 2016, NIPS.

[25] Alex Graves,et al. Scaling Memory-Augmented Neural Networks with Sparse Reads and Writes , 2016, NIPS.

[26] Pascal Vincent,et al. Hierarchical Memory Networks , 2016, ArXiv.

[27] Samy Bengio,et al. Order Matters: Sequence to sequence for sets , 2015, ICLR.

[28] George Kurian,et al. Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.

[29] Omer Levy,et al. Published as a conference paper at ICLR 2018 S IMULATING A CTION D YNAMICS WITH N EURAL P ROCESS N ETWORKS , 2018 .