SpikeGPT: Generative Pre-trained Language Model with Spiking Neural Networks

As the size of large language models continue to scale, so does the computational resources required to run it. Spiking neural networks (SNNs) have emerged as an energy-efficient approach to deep learning that leverage sparse and event-driven activations to reduce the computational overhead associated with model inference. While they have become competitive with non-spiking models on many computer vision tasks, SNNs have also proven to be more challenging to train. As a result, their performance lags behind modern deep learning, and we are yet to see the effectiveness of SNNs in language generation. In this paper, inspired by the RWKV language model, we successfully implement `SpikeGPT', a generative language model with pure binary, event-driven spiking activation units. We train the proposed model on three model variants: 45M, 125M and 260M parameters. To the best of our knowledge, this is 4x larger than any functional backprop-trained SNN to date. We achieve this by modifying the transformer block to replace multi-head self attention to reduce quadratic computational complexity to linear with increasing sequence length. Input tokens are instead streamed in sequentially to our attention mechanism (as with typical SNNs). Our preliminary experiments show that SpikeGPT remains competitive with non-spiking models on tested benchmarks, while maintaining 5x less energy consumption when processed on neuromorphic hardware that can leverage sparse, event-driven activations. Our code implementation is available at https://github.com/ridgerchu/SpikeGPT.

[1]  Matthew R. Guthaus,et al.  OpenSpike: An OpenRAM SNN Accelerator , 2023, 2023 IEEE International Symposium on Circuits and Systems (ISCAS).

[2]  Lei Deng,et al.  Attention Spiking Neural Networks , 2022, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Derek F. Wong,et al.  Towards Energy-Preserving Natural Language Understanding With Spiking Neural Networks , 2023, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[4]  Wei Lu,et al.  Intelligence Processing Units Accelerate Neuromorphic Learning , 2022, ArXiv.

[5]  Yunlin Lei,et al.  Spikeformer: A Novel Architecture for Training High-Performance Low-Latency Spiking Neural Network , 2022, ArXiv.

[6]  Yaowei Wang,et al.  Spikformer: When Spiking Neural Network Meets Transformer , 2022, ICLR.

[7]  Yule Duan,et al.  TCJA-SNN: Temporal-Channel Joint Attention for Spiking Neural Networks , 2022, IEEE transactions on neural networks and learning systems.

[8]  Bennamoun,et al.  Spiking Neural Networks for Frame-based and Event-based Single Object Localization , 2022, ArXiv.

[9]  Benoît Miramond,et al.  Object Detection with Spiking Neural Networks on Automotive Event Data , 2022, 2022 International Joint Conference on Neural Networks (IJCNN).

[10]  J. Eshraghian,et al.  Memristor-Based Binarized Spiking Neural Networks: Challenges and applications , 2022, IEEE Nanotechnology Magazine.

[11]  J. Eshraghian,et al.  The fine line between dead neurons and sparsity in binarized spiking neural networks , 2022, ArXiv.

[12]  D. Jeong,et al.  Training Spiking Neural Networks Using Lessons From Deep Learning , 2021, ArXiv.

[13]  Charlotte Frenkel,et al.  Sparsity provides a competitive advantage , 2021, Nature Machine Intelligence.

[14]  Guoqi Li,et al.  Temporal-wise Attention Spiking Neural Networks for Event Streams Classification , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[15]  Nitish Srivastava,et al.  An Attention Free Transformer , 2021, ArXiv.

[16]  Yonghong Tian,et al.  Deep Residual Learning in Spiking Neural Networks , 2021, NeurIPS.

[17]  Charles Foster,et al.  The Pile: An 800GB Dataset of Diverse Text for Language Modeling , 2020, ArXiv.

[18]  Ali Akoglu,et al.  RANC: Reconfigurable Architecture for Neuromorphic Computing , 2020, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[19]  Lucy J. Colwell,et al.  Rethinking Attention with Performers , 2020, ICLR.

[20]  Yi Tay,et al.  Synthesizer: Rethinking Self-Attention for Transformer Models , 2020, ICML.

[21]  Payal Dhar,et al.  The carbon impact of artificial intelligence , 2020, Nature Machine Intelligence.

[22]  Raghavendra Selvan,et al.  Carbontracker: Tracking and Predicting the Carbon Footprint of Training Deep Learning Models , 2020, ArXiv.

[23]  Nikolaos Pappas,et al.  Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention , 2020, ICML.

[24]  Mark Chen,et al.  Language Models are Few-Shot Learners , 2020, NeurIPS.

[25]  Noam Shazeer,et al.  GLU Variants Improve Transformer , 2020, ArXiv.

[26]  Lukasz Kaiser,et al.  Reformer: The Efficient Transformer , 2020, ICLR.

[27]  Sungroh Yoon,et al.  Spiking-YOLO: Spiking Neural Network for Energy-Efficient Object Detection , 2019, AAAI.

[28]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[29]  Stephen Merity,et al.  Single Headed Attention RNN: Stop Thinking With Your Head , 2019, ArXiv.

[30]  Hesham Mostafa,et al.  Surrogate Gradient Learning in Spiking Neural Networks: Bringing the Power of Gradient-based optimization to spiking neural networks , 2019, IEEE Signal Processing Magazine.

[31]  Kaushik Roy,et al.  Towards spike-based machine intelligence with neuromorphic computing , 2019, Nature.

[32]  Yiming Yang,et al.  Transformer-XL: Attentive Language Models beyond a Fixed-Length Context , 2019, ACL.

[33]  Michael Pfeiffer,et al.  Deep Learning With Spiking Neurons: Opportunities and Challenges , 2018, Front. Neurosci..

[34]  Hong Wang,et al.  Loihi: A Neuromorphic Manycore Processor with On-Chip Learning , 2018, IEEE Micro.

[35]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[36]  Yann Dauphin,et al.  Language Modeling with Gated Convolutional Networks , 2016, ICML.

[37]  Andrew S. Cassidy,et al.  Conversion of artificial recurrent neural networks to spiking neural networks for low-power neuromorphic hardware , 2016, 2016 IEEE International Conference on Rebooting Computing (ICRC).

[38]  Sanja Fidler,et al.  Aligning Books and Movies: Towards Story-Like Visual Explanations by Watching Movies and Reading Books , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[39]  Andrew S. Cassidy,et al.  A million spiking-neuron integrated circuit with a scalable communication network and interface , 2014, Science.

[40]  Alex Graves,et al.  Generating Sequences With Recurrent Neural Networks , 2013, ArXiv.

[41]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[42]  Wolfgang Maass,et al.  Networks of Spiking Neurons: The Third Generation of Neural Network Models , 1996, Electron. Colloquium Comput. Complex..