论文信息 - Online Learning of Recurrent Neural Architectures by Locally Aligning Distributed Representations

Online Learning of Recurrent Neural Architectures by Locally Aligning Distributed Representations

Temporal models based on recurrent neural networks have proven to be quite powerful in a wide variety of applications, including language modeling and speech processing. However, to train these models, one relies on back-propagation through time, which entails unfolding the network over many time steps, making the process of conducting credit assignment considerably more challenging. Furthermore, the nature of back-propagation itself does not permit the use of non-differentiable activation functions and is inherently sequential, making parallelization of the underlying training process very difficult. In this work, we propose the Parallel Temporal Neural Coding Network, a biologically inspired model trained by the local learning algorithm known as Local Representation Alignment, that aims to resolve the difficulties and problems that plague recurrent networks trained by back-propagation through time. Most notably, this architecture requires neither unrolling nor the derivatives of its internal activation functions. We compare our model and learning procedure to other online back-propagation-through-time alternatives (which also tend to be computationally expensive), including real-time recurrent learning, echo state networks, and unbiased online recurrent optimization, and show that it outperforms them on sequence modeling benchmarks such as Bouncing MNIST, a new benchmark we call Bouncing NotMNIST, and Penn Treebank. Notably, our approach can, in some instances, even outperform full back-propagation through time itself as well as variants such as sparse attentive back-tracking. Furthermore, we present promising experimental results that demonstrate our model's ability to conduct zero-shot adaptation.

[1] Moshe Bar,et al. Predictive Feedback and Conscious Visual Experience , 2012, Front. Psychology.

[2] Yoshua Bengio,et al. Equilibrium Propagation: Bridging the Gap between Energy-Based Models and Backpropagation , 2016, Front. Comput. Neurosci..

[3] James M. Rehg,et al. Temporal causality for the analysis of visual events , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[4] Terry Winograd,et al. Understanding natural language , 1974 .

[5] Arild Nøkland,et al. Direct Feedback Alignment Provides Learning in Deep Neural Networks , 2016, NIPS.

[6] Alexander Ororbia,et al. Biologically Motivated Algorithms for Propagating Local Target Representations , 2018, AAAI.

[7] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[8] Joelle Pineau,et al. Piecewise Latent Variables for Neural Variational Text Processing , 2016, EMNLP.

[9] C. Lee Giles,et al. A Neural Temporal Model for Human Motion Prediction , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[10] Thomas J. Walsh,et al. Integrating Sample-Based Planning and Model-Based Reinforcement Learning , 2010, AAAI.

[11] Christopher Joseph Pal,et al. Semi-supervised classification with hybrid generative/discriminative methods , 2007, KDD '07.

[12] David Reitter,et al. Online Learning of Deep Hybrid Architectures for Semi-supervised Categorization , 2015, ECML/PKDD.

[13] P J Webros. BACKPROPAGATION THROUGH TIME: WHAT IT DOES AND HOW TO DO IT , 1990 .

[14] Ronald Poppe,et al. Vision-based human motion analysis: An overview , 2007, Comput. Vis. Image Underst..

[15] Rajesh P. N. Rao,et al. Dynamic Model of Visual Recognition Predicts Neural Response Properties in the Visual Cortex , 1997, Neural Computation.

[16] Eder Santana,et al. Exploiting Spatio-Temporal Structure with Recurrent Winner-Take-All Networks , 2016, IEEE Transactions on Neural Networks and Learning Systems.

[17] John J. Hopfield,et al. Neural networks and physical systems with emergent collective computational abilities , 1999 .

[18] G. Pourtois,et al. What is Bottom-Up and What is Top-Down in Predictive Coding? , 2013, Front. Psychol..

[19] Jochen J. Steil,et al. Analyzing the weight dynamics of recurrent learning algorithms , 2005, Neurocomputing.

[20] Henry Markram,et al. On the computational power of circuits of spiking neurons , 2004, J. Comput. Syst. Sci..

[21] Beatrice Santorini,et al. Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.