Reconsidering the computational model for the auditory nerve: unsupervised learning or task-based optimization?

Efficient coding has been a leading computational principle to understand the sensory systems. The auditory nerves have been explained by unsupervised learning of natural sounds: their filter shapes resemble the basis optimized to code human voice. However, the previous study did not clearly distinguish sounds recorded in studios and those in the environment. We found that the model fails to explain the auditory nerves when applied to environmental recordings because of reverberations. How can we model the auditory nerves including the environmental modulations? We hypothesized that the auditory nerves are optimized to perform auditory tasks we face in the environment. To test this, we trained a deep convolutional neural network to classify phonemes based on their reverberated waveforms. The filters learned in the first layer showed characteristics similar to the auditory nerves. The results suggest that the auditory nerves efficiently encode task-related information rather than the entire incoming signal.

[1]  Mario A Ruggero,et al.  Unexceptional sharpness of frequency tuning in the human cochlea. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[2]  Michael S. Lewicki,et al.  Efficient coding of natural sounds , 2002, Nature Neuroscience.

[3]  Hsiao-Wuen Hon,et al.  Speaker-independent phone recognition using hidden Markov models , 1989, IEEE Trans. Acoust. Speech Signal Process..

[4]  Shigeaki Amano,et al.  Development of Japanese infant speech database and speaking rate analysis , 2002, INTERSPEECH.

[5]  Josh H McDermott,et al.  Statistics of natural reverberation enable perceptual separation of sound and space , 2016, Proceedings of the National Academy of Sciences.

[6]  Ha Hong,et al.  Performance-optimized hierarchical models predict neural responses in higher visual cortex , 2014, Proceedings of the National Academy of Sciences.