Environmental Sound Classification with Tiny Transformers in Noisy Edge Environments

The unprecedented growth of edge sensor infrastructure is driving the demand function for in situ analytics, i.e. automated decision support at the point of data collection. In the present work, we detail our state-of-the-art Environmental Sound Classification (ESC) framework that is capable of near real-time acoustic categorization directly at the edge. Existing ESC algorithms primarily train and test on pristine datasets that fail in real-world deployments due their inability to handle real-world noisy environments. Methods to denoise the sounds are often computationally expensive for edge devices and do not guarantee performance improvements. To this end, we investigate a way to make existing ESC models robust and make them work in operational resource-constrained settings. Our framework employs a noisy classification model consisting of a tiny BERT-based Transformer (less than 20,000 parameters) and considers hardening of this model through the use of transmission channel noise augmentation. We detail real-world results through its deployment on a Raspberry Pi Zero and demonstrate its classification performance.