A Biologically-Inspired Low-Bit-Rate Universal Audio Coder

We propose a new biologically-inspired paradigm for universal audio coding based on neural spikes. Our proposed approach is based on the generation of sparse 2-D representations of audio signals, dubbed as spikegrams. The spikegrams are generated by projecting the signal onto a set of overcomplete adaptive gammachirp (gammatones with additional tuning parameters) kernels. A masking model is applied to the spikegrams to remove inaudible spikes and to increase the coding efficiency. The paradigm proposed in this paper is a first step towards the implementation of a high-quality audio encoder by further processing acoustical events generated in the spikegrams. Upon necessary optimization and fine-tuning our coding system, operating at 1 bit/sample for sound sampled at 44.1 kHz, is expected to deliver high quality audio for broadcast applications and other applications such as archiving and audio recording.

[1]  Gernot Kubin,et al.  On speech coding in a perceptual domain , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[2]  Michael S. Lewicki,et al.  Efficient Coding of Time-Relative Structure Using Spikes , 2005, Neural Computation.

[3]  Martin Vetterli,et al.  Atomic signal models based on recursive filter banks , 1997, Conference Record of the Thirty-First Asilomar Conference on Signals, Systems and Computers (Cat. No.97CB36136).

[4]  M. Goodwin,et al.  Atomic signal models based on recursive filter banks , 1997 .

[5]  Roy D. Patterson,et al.  A Dynamic Compressive Gammachirp Auditory Filterbank , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[6]  Rémi Gribonval,et al.  Fast matching pursuit with a multiscale dictionary of Gaussian chirps , 2001, IEEE Trans. Signal Process..

[7]  T. Irino,et al.  A compressive gammachirp auditory filter for both physiological and psychophysical data. , 2001, The Journal of the Acoustical Society of America.

[8]  Michael S. Lewicki,et al.  Efficient auditory coding , 2006, Nature.

[9]  Daniel J. Graham,et al.  Sparse Coding in the Neocortex , 2007 .

[10]  W. Jesteadt Forward masking as a function of frequency , 1979 .

[11]  Hugo Fastl,et al.  Psychoacoustics: Facts and Models , 1990 .

[12]  Eliathamby Ambikairajah,et al.  Wideband speech and audio coding using gammatone filter banks , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[13]  Gernot Kubin,et al.  Anthropomorphic Coding of Speech and Audio: A Model Inversion Approach , 2005, EURASIP J. Adv. Signal Process..

[14]  E. Zwicker Dependence of post-masking on masker duration and its relation to temporal effects in loudness. , 1984, The Journal of the Acoustical Society of America.