Predicting the intrusiveness of noise through sparse coding with auditory kernels

We propose a perceptual model using sparse sound representations with auditory kernels.We show that the number of kernels models perceptual properties of background noise.The achieved average correlation to subjective noise intrusiveness scores exceeds 95%. This paper presents a novel approach to predicting the intrusiveness of background noises in speech signals as it is perceived by human listeners. This problem is of particular interest in telephony, where the recently widened range of transmitted audio frequencies has increased the importance of appropriate background noise reduction strategies. Current approaches predict the average noise intrusiveness score that would be obtained in a subjective listening test by combining different signal features related to physical properties (e.g., signal energy, spectral distribution) or psychoacoustic estimations (e.g., loudness) of noise. The combination and/or implementation of such features requires expert knowledge or the availability of training data. We present a novel approach that is based on a model of efficient sound coding, using a sparse spike coding representation of noise. We show that the sparsity of these representations implicitly models several factors in the perception of noise, and yields predictions of noise intrusiveness scores that compare to or outperform traditional features, without the use of training data. Our evaluation datasets and used performance metrics are based on standardized methods for the evaluation of quality prediction models.

[1]  Michael S. Lewicki,et al.  Efficient auditory coding , 2006, Nature.

[2]  Brian R Glasberg,et al.  Derivation of auditory filter shapes from notched-noise data , 1990, Hearing Research.

[3]  Hugo Fastl,et al.  Psychoacoustics: Facts and Models , 1990 .

[4]  Frank Kettler,et al.  The Relative Approach Algorithm and its Applications in New Perceptual Models for Noisy Speech and Echo Performance , 2011 .

[5]  Brian C J Moore,et al.  Coding of Sounds in the Auditory System and Its Relevance to Signal Processing and Coding in Cochlear Implants , 2003, Otology & neurotology : official publication of the American Otological Society, American Neurotology Society [and] European Academy of Otology and Neurotology.

[6]  R. Patterson,et al.  Complex Sounds and Auditory Images , 1992 .

[7]  Yi Hu,et al.  Subjective comparison and evaluation of speech enhancement algorithms , 2007, Speech Commun..

[8]  Sacha Krstulovic,et al.  Mptk: Matching Pursuit Made Tractable , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[9]  Terrence J Sejnowski,et al.  Communication in Neuronal Networks , 2003, Science.

[10]  David J. Field,et al.  Sparse coding with an overcomplete basis set: A strategy employed by V1? , 1997, Vision Research.

[11]  Rémi Gribonval Approximations non-linéaires pour l'analyse de signaux sonores , 1999 .

[12]  E. Premat,et al.  Noise and its effects - : A review on qualitative aspects of sound. Part II: Noise and annoyance , 2005 .

[13]  Michael S. Lewicki,et al.  Efficient Coding of Time-Relative Structure Using Spikes , 2005, Neural Computation.

[14]  Stéphane Mallat,et al.  Matching pursuits with time-frequency dictionaries , 1993, IEEE Trans. Signal Process..

[15]  E. de Boer,et al.  On cochlear encoding: Potentialities and limitations of the reverse‐correlation technique , 1978 .

[16]  Terrence J. Sejnowski,et al.  Coding Time-Varying Signals Using Sparse, Shift-Invariant Representations , 1998, NIPS.

[17]  M. Sachs,et al.  Rate versus level functions for auditory-nerve fibers in cats: tone-burst stimuli. , 1974, The Journal of the Acoustical Society of America.

[18]  D. Kahneman,et al.  Duration neglect in retrospective evaluations of affective episodes. , 1993, Journal of personality and social psychology.

[19]  J. Berger,et al.  P.563—The ITU-T Standard for Single-Ended Speech Quality Assessment , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[20]  Martin Vetterli,et al.  Atomic signal models based on recursive filter banks , 1997, Conference Record of the Thirty-First Asilomar Conference on Signals, Systems and Computers (Cat. No.97CB36136).

[21]  Bob L. Sturm,et al.  Analysis, Visualization, and Transformation of Audio Signals Using Dictionary-based Methods , 2009, ICMC.

[22]  Michael Keyhl,et al.  Perceptual Objective Listening Quality Assessment (POLQA), The Third Generation ITU-T Standard for End-to-End Speech Quality Measurement Part I-Temporal Alignment , 2013 .

[23]  Birger Kollmeier,et al.  PEMO-Q—A New Method for Objective Audio Quality Assessment Using a Model of Auditory Perception , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[24]  Andries P. Hekstra,et al.  Perceptual evaluation of speech quality (PESQ)-a new method for speech quality assessment of telephone networks and codecs , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[25]  Hervé Bourlard,et al.  Noise Intrusiveness Factors in Speech Telecommunications , 2013 .

[26]  B. Berglund,et al.  A principal components model of soundscape perception. , 2010, The Journal of the Acoustical Society of America.

[27]  C. Marquis-Favre,et al.  Annoyance from industrial noise: indicators for a wide variety of industrial sources. , 2010, The Journal of the Acoustical Society of America.

[28]  Hossein Najaf-Zadeh,et al.  Auditory-inspired sparse representation of audio signals , 2011, Speech Commun..

[29]  E. Zwicker Procedure for calculating loudnesss of temporally variable sounds. , 1977, The Journal of the Acoustical Society of America.

[30]  H. Barlow,et al.  Single Units and Sensation: A Neuron Doctrine for Perceptual Psychology? , 1972, Perception.

[31]  Hynek Hermansky,et al.  Qualcomm-ICSI-OGI features for ASR , 2002, INTERSPEECH.

[32]  Weisi Lin,et al.  Nonintrusive Quality Assessment of Noise Suppressed Speech With Mel-Filtered Energies and Support Vector Regression , 2012, IEEE Transactions on Audio, Speech, and Language Processing.