Spectrum restoration from multiscale auditory phase singularities by generalized projections

We examine the encoding of acoustic spectra by parameters derived from singularities found in their multiscale auditory representations. The multiscale representation is a wavelet transform of an auditory version of the spectrum, formulated based on findings of perceptual experiments and physiological research in the auditory cortex. The multiscale representation of a spectral pattern usually contains well-defined singularities in its phase function that reflect prominent features of the underlying spectrum such as its relative peak locations and amplitudes. Properties (locations and strength) of these singularities are examined and employed to reconstruct the original spectrum by using an iterative projection algorithm. Although the singularities form a nonconvex set, simulations demonstrate that a well-chosen initial pattern usually converges on a good approximation of the input spectrum. Perceptually intelligible speech can be resynthesized from the reconstructed auditory spectrograms, and hence these singularities can potentially serve as efficient features in speech compression. Besides, the singularities are very noise-robust which makes them useful features in various applications such as vowel recognition and speaker identification

[1]  G. E. Peterson,et al.  Control Methods Used in a Study of the Vowels , 1951 .

[2]  E. Bolinder The Fourier integral and its applications , 1963 .

[3]  B. Atal Automatic Speaker Recognition Based on Pitch Contours , 1969 .

[4]  R. Gerchberg A practical algorithm for the determination of phase from image and diffraction plane pictures , 1972 .

[5]  A. Papoulis A new algorithm in spectral analysis and band-limited extrapolation. , 1975 .

[6]  B.S. Atal,et al.  Automatic recognition of speakers from their voices , 1976, Proceedings of the IEEE.

[7]  B. Logan Information in the zero crossings of bandpass signals , 1977, The Bell System Technical Journal.

[8]  P. Dallos,et al.  Forward masking of auditory nerve fiber responses. , 1979, Journal of neurophysiology.

[9]  D Marr,et al.  A computational theory of human stereo vision. , 1979, Proceedings of the Royal Society of London. Series B, Biological sciences.

[10]  D Marr,et al.  Theory of edge detection , 1979, Proceedings of the Royal Society of London. Series B. Biological Sciences.

[11]  J R Fienup,et al.  Phase retrieval algorithms: a comparison. , 1982, Applied optics.

[12]  M. Hayes The reconstruction of a multidimensional sequence from the phase or magnitude of its Fourier transform , 1982 .

[13]  Andrew P. Witkin,et al.  Scale-Space Filtering , 1983, IJCAI.

[14]  H. Barlow Vision: A computational investigation into the human representation and processing of visual information: David Marr. San Francisco: W. H. Freeman, 1982. pp. xvi + 397 , 1983 .

[15]  Henry Stark,et al.  Image restoration by the method of generalized projections with application to restoration from magnitude , 1984, ICASSP.

[16]  Alan L. Yuille,et al.  Fingerprints Theorems , 1984, AAAI.

[17]  R.H.T Bates,et al.  Uniqueness of solutions to two-dimensional fourier phase problems for localized and positive images , 1984, Comput. Vis. Graph. Image Process..

[18]  Aharon Levi,et al.  Image restoration by the method of generalized projections with application to restoration from magnitude , 1984 .

[19]  Andrew P. Witkin,et al.  Scale-space filtering: A new approach to multi-scale description , 1984, ICASSP.

[20]  James R. Fienup,et al.  Phase-retrieval stagnation problems and solutions , 1986 .

[21]  Alan L. Yuille,et al.  Scaling Theorems for Zero Crossings , 1987, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22]  John F. Canny,et al.  A Computational Approach to Edge Detection , 1986, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  Shlomo Shamai,et al.  Reconstruction of nonperiodic two-dimensional signals from zero crossings , 1987, IEEE Trans. Acoust. Speech Signal Process..

[24]  Michael Jenkin,et al.  The Measurement of Binocular Disparity , 1988 .

[25]  Bernd Girod,et al.  The Information Theoretical Significance of Spatial and Temporal Masking in Video Signals , 1989, Photonics West - Lasers and Applications in Science and Engineering.

[26]  Robert Hummel,et al.  Reconstructions from zero crossings in scale space , 1989, IEEE Trans. Acoust. Speech Signal Process..

[27]  J. H. Seldin,et al.  Numerical investigation of the uniqueness of phase retrieval , 1990 .

[28]  David J. Fleet,et al.  Phase-based disparity measurement , 1991, CVGIP Image Underst..

[29]  Z. Berman The Uniqueness Question of Discrete Wavelet Maxima Representation , 1991 .

[30]  Stéphane Mallat,et al.  Zero-crossings of a wavelet transform , 1991, IEEE Trans. Inf. Theory.

[31]  Stéphane Mallat,et al.  Singularity detection and processing with wavelets , 1992, IEEE Trans. Inf. Theory.

[32]  Kuansan Wang,et al.  Auditory representations of acoustic signals , 1992, IEEE Trans. Inf. Theory.

[33]  Stéphane Mallat,et al.  Characterization of Signals from Multiscale Edges , 2011, IEEE Trans. Pattern Anal. Mach. Intell..

[34]  Guy J. Brown Computational auditory scene analysis : a representational approach , 1993 .

[35]  S. Zahorian,et al.  Spectral-shape features versus formants as acoustic correlates for vowels. , 1993, The Journal of the Acoustical Society of America.

[36]  John S. Baras,et al.  Efficient organization of large ship radar databases using wavelets and structured vector quantization , 1993, Proceedings of 27th Asilomar Conference on Signals, Systems and Computers.

[37]  C. Evci,et al.  Speech coding requirements from the perspective of the future mobile systems , 1993, Proceedings., IEEE Workshop on Speech Coding for Telecommunications,.

[38]  Hideki Kawahara,et al.  Signal reconstruction from modified auditory wavelet transform , 1993, IEEE Trans. Signal Process..

[39]  S. Shamma,et al.  Organization of response areas in ferret primary auditory cortex. , 1993, Journal of neurophysiology.

[40]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[41]  Douglas A. Reynolds,et al.  Robust text-independent speaker identification using Gaussian mixture speaker models , 1995, IEEE Trans. Speech Audio Process..

[42]  Leslie S. Smith Onset-based Sound Segmentation , 1995, NIPS.

[43]  C. Westin,et al.  Phase-based Disparity Estimation , 1995 .

[44]  Kuansan Wang,et al.  Spectral shape analysis in the central auditory system , 1995, IEEE Trans. Speech Audio Process..

[45]  Douglas A. Reynolds,et al.  Measuring fine structure in speech: application to speaker identification , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[46]  J. Hillenbrand,et al.  Acoustic characteristics of American English vowels. , 1994, The Journal of the Acoustical Society of America.

[47]  Nelson M. Blachman,et al.  The shape of low minima of the envelope of narrowband Gaussian noise , 1996, IEEE Trans. Inf. Theory.

[48]  Shihab A. Shamma,et al.  Representation of musical timbre in the auditory cortex , 1997 .

[49]  Rolf Unbehauen,et al.  Methods for reconstruction of 2-D sequences from Fourier transform magnitude , 1997, IEEE Trans. Image Process..

[50]  S. Shamma,et al.  Representation of Complex Spectra in Auditory Cortex , 1997 .

[51]  S. Shamma,et al.  Spectro-temporal modulation transfer functions and speech intelligibility. , 1999, The Journal of the Acoustical Society of America.

[52]  S A Shamma,et al.  Spectro-temporal response field characterization with dynamic ripples in ferret primary auditory cortex. , 2001, Journal of neurophysiology.

[53]  Jean Rouat,et al.  Combining pitch and MFCC for speaker identification systems , 2001, Odyssey.

[54]  Methods for objective and subjective assessment of quality Perceptual evaluation of speech quality ( PESQ ) : An objective method for end-to-end speech quality assessment of narrow-band telephone networks and speech codecs , 2002 .

[55]  S. Shamma,et al.  An account of monaural phase sensitivity. , 2002, The Journal of the Acoustical Society of America.

[56]  S. Shamma,et al.  The synergy between speech production and perception. , 2003, The Journal of the Acoustical Society of America.

[57]  Mounya Elhilali,et al.  A spectro-temporal modulation index (STMI) for assessment of speech intelligibility , 2003, Speech Commun..

[58]  Larry S. Davis,et al.  Pitch and timbre manipulations using cortical representation of sound , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[59]  Panayiotis G. Georgiou,et al.  Speaker identification using supra-segmental pitch pattern dynamics , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[60]  J. Koenderink The structure of images , 2004, Biological Cybernetics.

[61]  T. Sanger,et al.  Stereo disparity computation using Gabor filters , 1988, Biological Cybernetics.

[62]  Powen Ru,et al.  Multiresolution spectrotemporal analysis of complex sounds. , 2005, The Journal of the Acoustical Society of America.