A Non-Intrusive Quality and Intelligibility Measure of Reverberant and Dereverberated Speech

A modulation spectral representation is investigated for non-intrusive quality and intelligibility measurement of reverberant and dereverberated speech. The representation is obtained by means of an auditory-inspired filterbank analysis of critical-band temporal envelopes of the speech signal. Modulation spectral insights are used to develop an adaptive measure termed speech to reverberation modulation energy ratio. Experimental results show the proposed measure outperforming three standard algorithms for tasks involving estimation of multiple dimensions of perceived coloration, as well as quality measurement and intelligibility estimation of reverberant and dereverberated speech.

[1]  Ton Kalker,et al.  Reverberation Assessment in Audioband Speech Signals for Telepresence Systems , 2008, SIGMAP.

[2]  Oded Ghitza,et al.  Objective Assessment of Speech and Audio Quality - Technology and Applications , 2006, IEEE Trans. Speech Audio Process..

[3]  Patrick A. Naylor,et al.  EVALUATION OF SPEECH DEREVERBERATION ALGORITHMS USING THE MARDY DATABASE , 2006 .

[4]  Malcolm Slaney,et al.  An Efficient Implementation of the Patterson-Holdsworth Auditory Filter Bank , 1997 .

[5]  Herman J. M. Steeneken,et al.  Validation of the revised STIr method , 2002, Speech Commun..

[6]  Fredrik Bajers COLORATION IN ROOM IMPULSE RESPONSES , 2004 .

[7]  Tiago H. Falk,et al.  A NON-INTRUSIVE QUALITY MEASURE OF DEREVERBERATED SPEECH , 2008 .

[8]  METHODS FOR SUBJECTIVE DETERMINATION OF TRANSMISSION QUALITY Summary , 2022 .

[9]  A. Gray,et al.  Distance measures for speech processing , 1976 .

[10]  Tiago H. Falk,et al.  Temporal Dynamics for Blind Measurement of Room Acoustical Parameters , 2010, IEEE Transactions on Instrumentation and Measurement.

[11]  Raymond L. Goldsworthy,et al.  Analysis of speech-based Speech Transmission Index methods with implications for nonlinear operations. , 2004, The Journal of the Acoustical Society of America.

[12]  Shiu-keung Tang,et al.  Reverberation times and speech transmission indices in classrooms , 2006 .

[13]  Doh-Suk Kim,et al.  ANIQUE+: A new American national standard for non-intrusive estimation of narrowband speech quality , 2007, Bell Labs Technical Journal.

[14]  Tiago H. Falk,et al.  Modulation Spectral Features for Robust Far-Field Speaker Identification , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[15]  R. Plomp,et al.  Effect of temporal envelope smearing on speech reception. , 1994, The Journal of the Acoustical Society of America.

[16]  Doh-Suk Kim A cue for objective speech quality estimation in temporal envelope representations , 2004, IEEE Signal Processing Letters.

[17]  정대업,et al.  초기반사음의 공간적 성분이 명료도에 미치는 영향에 관한 연구 ( Spatial distribution of early reflections and speech intelligibility ) , 2001 .

[18]  ITU-T Rec. P.862.3 (11/2007) Application guide for objective quality measurement based on Recommendations P.862, P.862.1 and P.862.2 , 2008 .

[19]  Andrew Sekey,et al.  An Objective Measure for Predicting Subjective Quality of Speech Coders , 1992, IEEE J. Sel. Areas Commun..

[20]  T Houtgast,et al.  A physical method for measuring speech-transmission quality. , 1980, The Journal of the Acoustical Society of America.

[21]  R. Plomp,et al.  Effect of reducing slow temporal modulations on speech reception. , 1994, The Journal of the Acoustical Society of America.

[22]  R Drullman,et al.  Temporal envelope and fine structure cues for speech intelligibility. , 1994, The Journal of the Acoustical Society of America.

[23]  Misha Pavel,et al.  Intelligibility of speech with filtered time trajectories of spectral envelopes , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[24]  Gerhard Schmidt,et al.  Speech and Audio Processing in Adverse Environments , 2008 .

[25]  J. Beerends,et al.  Measurement of speech intelligibility based on the PESQ approach , 2004 .

[26]  T Dau,et al.  A quantitative model of the "effective" signal processing in the auditory system. I. Model structure. , 1996, The Journal of the Acoustical Society of America.

[27]  L D Braida,et al.  A method to determine the speech transmission index from speech waveforms. , 1999, The Journal of the Acoustical Society of America.

[28]  Doh-Suk Kim,et al.  ANIQUE: An Auditory Model for Single-Ended Speech Quality Estimation , 2005, IEEE Trans. Speech Audio Process..

[29]  Marc Moonen,et al.  Multimicrophone Speech Dereverberation: Experimental Validation , 2007, EURASIP J. Audio Speech Music. Process..

[30]  Tor Halmrast,et al.  Sound coloration from (very) early reflections , 2001 .

[31]  R A Lutfi,et al.  Children's detection of pure-tone signals with random multitone maskers. , 2001, The Journal of the Acoustical Society of America.

[32]  Doh-Suk Kim,et al.  ANIQUE+: A new American national standard for non-intrusive estimation of narrowband speech quality: Research Articles , 2007 .

[33]  Zachary M. Smith,et al.  Chimaeric sounds reveal dichotomies in auditory perception , 2002, Nature.

[34]  Patrick A. Naylor,et al.  Semantic Colouration Space Investigation: Controlled Colouration in the Bark-Sone Domain , 2007, 2007 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics.

[35]  Brian R Glasberg,et al.  Derivation of auditory filter shapes from notched-noise data , 1990, Hearing Research.

[36]  Kuldip K. Paliwal,et al.  Effect of Analysis Window Duration on Speech Intelligibility , 2008, IEEE Signal Processing Letters.

[37]  J. Berger,et al.  P.563—The ITU-T Standard for Single-Ended Speech Quality Assessment , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[38]  Gerald A. Studebaker,et al.  Acoustical Factors Affecting Hearing Aid Performance , 1992 .

[39]  T. Houtgast,et al.  A review of the MTF concept in room acoustics and its use for estimating speech intelligibility in auditoria , 1985 .