Perception and automated assessment of audio quality in user generated content: An improved model

Technology to record sound, available in personal devices such as smartphones or video recording devices, is now ubiquitous. However, the production quality of the sound on this user-generated content is often very poor: distorted, noisy, with garbled speech or indistinct music. Our interest lies in the causes of the poor recording, especially what happens between the sound source and the electronic signal emerging from the microphone, and finding an automated method to warn the user of such problems. Typical problems, such as distortion, wind noise, microphone handling noise and frequency response, were tested. A perceptual model has been developed from subjective tests on the perceived quality of such errors and data measured from a training dataset composed of various audio files. It is shown that perceived quality is associated with distortion and frequency response, with wind and handling noise being just slightly less important. In addition, the contextual content of the audio sample was found to modulate perceived quality at similar levels to degradations such as wind and rendering those introduced by handling noise negligible.

[1]  J. Beerends,et al.  Perceptual Objective Listening Quality Assessment ( POLQA ) , The Third Generation ITU-T Standard for End-to-End Speech Quality Measurement Part II – Perceptual Model , 2013 .

[2]  Claire Wardle,et al.  Amateur Footage: A Global Study of User-Generated Content , 2014 .

[3]  Malcolm Slaney,et al.  Construction and evaluation of a robust multifeature speech/music discriminator , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[4]  David V. Anderson,et al.  Evaluating the Generalization of the Hearing Aid Speech Quality Index (HASQI) , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[5]  Brian C. J. Moore,et al.  The Effect of Nonlinear Distortion on the Perceived Quality of Music and Speech Signals , 2003 .

[6]  Alex Wilson,et al.  101 Mixes: A Statistical Analysis of Mix-Variation in a Dataset of Multi-Track Music Mixes , 2015 .

[7]  B. Moore,et al.  Perceived naturalness of spectrally distorted speech and music. , 2003, The Journal of the Acoustical Society of America.

[8]  Paul Kendrick,et al.  Measuring a portable audio device’s response to excessive sound levels , 2013 .

[9]  Francis F. Li,et al.  Perceived Audio Quality of Sounds Degraded by Non-linear Distortions and Single-Ended Assessment Using HASQI , 2015 .

[10]  Paul Kendrick,et al.  Perception and automatic detection of wind-induced microphone noise. , 2014, The Journal of the Acoustical Society of America.

[11]  Francis F. Li,et al.  Microphone Handling Noise: Measurements of Perceptual Threshold and Effects on Audio Quality , 2015, PloS one.

[12]  Michael Keyhl,et al.  Perceptual Objective Listening Quality Assessment (POLQA), The Third Generation ITU-T Standard for End-to-End Speech Quality Measurement Part I-Temporal Alignment , 2013 .

[13]  James M. Kates,et al.  The Hearing-Aid Speech Quality Index (HASQI) , 2010 .

[14]  Thomas Sporer,et al.  PEAQ - The ITU Standard for Objective Measurement of Perceived Audio Quality , 2000 .

[15]  Brian C. J. Moore,et al.  Predicting the Perceived Quality of Nonlinearly Distorted Music and Speech Signals , 2004 .

[16]  Antony William Rix,et al.  Perceptual evaluation of speech quality (PESQ): The new ITU standard for end-to-end speech quality a , 2002 .