Compressed Superposition of Neural Networks for Deep Learning in Edge Computing

This paper investigates a combination of the two recently proposed techniques: superposition of multiple neural networks into one and neural network compression. We show that these two techniques can be successfully combined to deliver a great potential for trimming down deep convolutional neural networks. The work can be relevant in the context of implementing deep learning on low-end computing devices as it enables neural networks to fit edge devices with constrained computational resources (e.g. sensors, mobile devices, controllers). We study the trade-offs between the model compression rate and the accuracy of the superimposed tasks and present a CNN pipeline where the fully connected layers are isolated from the convolutional layers and serve as a general purpose neural processing unit for several CNN models. We show how deep models can be highly compressed with a limited accuracy degradation when additional compression is performed within the superposition principle.