Efficient coding leads to novel features for speech recognition

The principle of efficient coding of stimuli can explain the receptive fields in the primary visual cortex and in the primary auditory cortex. When efficient coding is applied to a generative model, it forms biologically realistic basis functions and the code it produces has the spike-like property found in the cortex. We show that this representation can be used for isolated word speech recognition. We have trained a temporal generative model on spoken single digits. The code from the model is spatiotemporal and spike-like, and we used a k-nearest neighbour classifier to classify this code. The network is able to classify 92% of test samples correctly.