Neuromorphic speech processing for noisy environments

Current speech recognition systems perform very poorly in the presence of background noise, particularly for signal-to-noise ratios (SNR) below 10 dB and for certain noise conditions such as cafeteria noise. In this study we investigate the use of acoustic processing based on cochlear models and neural-like processing as a means of arriving at noise robust acoustic representation of speech. However, unlike previous work based on cochlear models that used cochlear filter parameters based on neurophysiological data, we optimize cochlear filter shape and thresholds to reduce the noise contribution in the resulting acoustic representations. Results suggest that average SNR improvements of the order of 5-10 dB can be obtained for noise corrupted signals with SNRs near 0-6 dB for realistic noise such as cafeteria noise. Furthermore, using a neural network to include context and arrive at a lower dimensional representation can lead to further improvements in SNR.<<ETX>>