Privacy preserving speech analysis using emotion filtering at the edge: poster abstract

Voice controlled devices and services are commonplace in consumer IoT. Cloud-based analysis services extract information from voice input using speech recognition techniques. Services providers can build detailed profiles of users' demographics, preferences and emotional states, etc., and may therefore significantly compromise privacy. To address this problem, a privacy-preserving intermediate layer between users and cloud services is proposed to sanitize voice input directly at edge devices by generating neutralized signals for forwarding. We show that a trained model, based on CycleGAN and deployed on a Raspberry Pi, enables identification and removal of sensitive emotional state information by ~91%, with minimal losses to speech recognition accuracy.