The promise of automated-driving cars causes the automotive and consumer electronics (CE) sector to rethink not only what it means to drive, but also the relationship between the car and the consumer. Recent trend in Internet of Vehicle Things (IoVT) promotes robust interactions between humans and vehicles, which ultimately points to enhance human abilities, such as hearing, visual surveillance, or emotion awareness, as a part of safety concern. The voice-based interactions (speech recognition and stress monitoring) will improve in-time awareness of the vehicle status. Unfortunately, the existing modulation domain speech enhancement techniques achieve low satisfactory performance in detecting humans’ stress emotions where the environmental noise is inevitable and varies with the location of every passing vehicle. Furthermore, the computational load introduces challenges in their implementation in automated vehicles. In this direction, we propose a front-end processing framework, in particular to stress emotion detection cases (such as anger, sad, fear, and happy) in different nonstationary noisy environments, such as car, airport, traffic, and train. This article encompasses three interrelated issues: 1) analysis, modification, and synthesis of noisy speech emotion in modulation domain in real-time background noise, 2) extracting set of Mel-frequency cepstral coefficients features from noisy speech stimuli for speech emotion recognition, and 3) evaluation of overall system performance by means of objective parameters, and confusion matrix in adverse environments using speech emotion database Interactive Emotional Dyadic Motion Capture. The experimental results show that favorable performance in state-of-the-art stress monitoring yields high levels of consumer satisfaction for security in vehicle comparison to traditional frameworks.
[1]
Biplab Sikdar,et al.
Consumer IoT: Security Vulnerability Case Studies and Solutions
,
2020,
IEEE Consumer Electronics Magazine.
[2]
Christos-Savvas Bouganis,et al.
Approximate LSTMs for Time-Constrained Inference: Enabling Fast Reaction in Self-Driving Cars
,
2019,
IEEE Consumer Electronics Magazine.
[3]
Kuldip K. Paliwal,et al.
Single-channel speech enhancement using spectral subtraction in the short-time modulation domain
,
2010,
Speech Commun..
[4]
Carlos Busso,et al.
IEMOCAP: interactive emotional dyadic motion capture database
,
2008,
Lang. Resour. Evaluation.
[5]
Rainer Martin,et al.
Noise power spectral density estimation based on optimal smoothing and minimum statistics
,
2001,
IEEE Trans. Speech Audio Process..