Learning optimal codes for natural images and sounds

A Bayesian method for inferring an optimal basis is applied to the problem of finding efficient codes for natural images. The key to the algorithm is multivariate non- Gaussian density estimation. This is equivalent, in various forms, to sparse coding or independent component analysis. The basis functions learned by the algorithm are oriented and localized in both space and frequency, bearing a resemblance to the spatial receptive fields of neurons in the primary visual cortex and to Gabor wavelet functions. An important advantage of the probabilistics framework is that it provides a method for comparing the coding efficiency of different bases objectively. The learned bases are shown to have better coding efficiency compared to traditional Fourier and wavelet bases. This framework can also be used to learn efficient codes of natural sound and the learned codes share many of the coding properties of the cochlear nerve. Time-frequency analysis is used to show that it is possible to derive both Fourier-like and wavelet-like representations by learning efficient codes for different classes of natural sounds.

[1]  Bruno A. Olshausen,et al.  Learning Sparse Image Codes using a Wavelet Pyramid Architecture , 2000, NIPS.

[2]  H. Attias,et al.  Independent Factor Analysis 1 Statistical Modeling and Blind Source Separation , 1999 .

[3]  Terrence J. Sejnowski,et al.  The “independent components” of natural scenes are edge filters , 1997, Vision Research.

[4]  Don Schofield Visualisations of speech based on a model of the peripheral auditory system , 1985 .

[5]  Tai Sing Lee,et al.  Image Representation Using 2D Gabor Wavelets , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[6]  R. Patterson Auditory filter shapes derived with noise stimuli. , 1976, The Journal of the Acoustical Society of America.

[7]  Andrzej Cichocki,et al.  A New Learning Algorithm for Blind Signal Separation , 1995, NIPS.

[8]  D J Field,et al.  Relations between the statistics of natural images and the response properties of cortical cells. , 1987, Journal of the Optical Society of America. A, Optics and image science.

[9]  Bruno A. Olshausen,et al.  PROBABILISTIC FRAMEWORK FOR THE ADAPTATION AND COMPARISON OF IMAGE CODES , 1999 .

[10]  Michael A. Saunders,et al.  Atomic Decomposition by Basis Pursuit , 1998, SIAM J. Sci. Comput..

[11]  I. Daubechies Orthonormal bases of compactly supported wavelets , 1988 .

[12]  E. de Boer,et al.  On cochlear encoding: potentialities and limitations of the reverse-correlation technique. , 1978, The Journal of the Acoustical Society of America.

[13]  Stéphane Mallat,et al.  Matching pursuits with time-frequency dictionaries , 1993, IEEE Trans. Signal Process..

[14]  J. Daugman Uncertainty relation for resolution in space, spatial frequency, and orientation optimized by two-dimensional visual cortical filters. , 1985, Journal of the Optical Society of America. A, Optics and image science.

[15]  J. Cardoso Infomax and maximum likelihood for blind source separation , 1997, IEEE Signal Processing Letters.

[16]  R Linsker,et al.  Perceptual neural organization: some approaches based on network models and information theory. , 1990, Annual review of neuroscience.

[17]  Terrence J. Sejnowski,et al.  Unsupervised Learning , 2018, Encyclopedia of GIS.

[18]  Eero P. Simoncelli,et al.  Image compression via joint statistical characterization in the wavelet domain , 1999, IEEE Trans. Image Process..