Temporal Dynamics for Blind Measurement of Room Acoustical Parameters

In this paper, short- and long-term temporal dynamic information is investigated for the blind measurement of room acoustical parameters. In particular, estimators of room reverberation time (T60) and direct-to-reverberant energy ratio (DRR) are proposed. Short-term temporal dynamic information is obtained from differential (delta) cepstral coefficients. The statistics computed from the zeroth-order delta cepstral sequence serve as input features to a support vector T60 estimator. Long-term temporal dynamic cues, on the other hand, are obtained from an auditory spectrotemporal representation of speech commonly referred to as modulation spectrum. A measure termed as reverberation-to-speech modulation energy ratio, which is computed per modulation frequency band, is proposed and serves as input to T60 and DRR estimators. Experiments show that the proposed estimators outperform a baseline system in scenarios involving reverberant speech with and without the presence of acoustic background noise. Experiments also suggest that estimators of subjective perception of spectral coloration, reverberant tail effect, and overall speech quality can be obtained with an adaptive speech-to-reverberation modulation energy ratio measure.

[1]  Mingyang Wu,et al.  A pitch-based method for the estimation of short reverberation time , 2006 .

[2]  Bayya Yegnanarayana,et al.  Enhancement of reverberant speech using LP residual signal , 2000, IEEE Trans. Speech Audio Process..

[3]  Douglas L. Jones,et al.  Blind estimation of reverberation time. , 2003, The Journal of the Acoustical Society of America.

[4]  Jont B. Allen,et al.  Image method for efficiently simulating small‐room acoustics , 1976 .

[5]  M. Schroeder New Method of Measuring Reverberation Time , 1965 .

[6]  T Dau,et al.  A quantitative model of the "effective" signal processing in the auditory system. I. Model structure. , 1996, The Journal of the Acoustical Society of America.

[7]  Patrick A. Naylor,et al.  EVALUATION OF SPEECH DEREVERBERATION ALGORITHMS USING THE MARDY DATABASE , 2006 .

[8]  Malcolm Slaney,et al.  An Efficient Implementation of the Patterson-Holdsworth Auditory Filter Bank , 1997 .

[9]  Hans-Günter Hirsch,et al.  The simulation of realistic acoustic input scenarios for speech recognition systems , 2005, INTERSPEECH.

[10]  Per Rubak,et al.  COLORATION IN ROOM IMPULSE RESPONSES , 2004 .

[11]  Alex Acero,et al.  Spoken Language Processing , 2001 .

[12]  Brian R Glasberg,et al.  Derivation of auditory filter shapes from notched-noise data , 1990, Hearing Research.

[13]  Henrique S. Malvar,et al.  Speech dereverberation via maximum-kurtosis subband adaptive filtering , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[14]  Hua Yuan,et al.  Spectro-temporal processing for blind estimation of reverberation time and single-ended quality measurement of reverberant speech , 2007, INTERSPEECH.

[15]  R. Plomp,et al.  Effect of reducing slow temporal modulations on speech reception. , 1994, The Journal of the Acoustical Society of America.

[16]  S. Bech,et al.  Timbral aspects of reproduced sound in small rooms. I. , 1995, The Journal of the Acoustical Society of America.

[17]  T. H. Curtis Characterization of room coloration by moments of room spectral response , 1975 .

[18]  R.A. Goubran,et al.  Combating Reverberation in Speaker Verification , 2005, 2005 IEEE Instrumentationand Measurement Technology Conference Proceedings.

[19]  Laurent Couvreur,et al.  Blind Model Selection for Automatic Speech Recognition in Reverberant Environments , 2004, J. VLSI Signal Process..

[20]  J. Berger,et al.  P.563—The ITU-T Standard for Single-Ended Speech Quality Assessment , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[21]  D. Ward,et al.  Statistical analysis of the autoregressive modeling of reverberant speech. , 2006, The Journal of the Acoustical Society of America.

[22]  Simon Haykin,et al.  Neural Networks: A Comprehensive Foundation , 1998 .

[23]  Tor Halmrast,et al.  Sound coloration from (very) early reflections , 2001 .

[24]  Robert B. Newman,et al.  Collected Papers on Acoustics , 1927 .

[25]  K. Pearson Contributions to the Mathematical Theory of Evolution , 1894 .

[26]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[27]  Emanuel A. P. Habets,et al.  Temporal selective dereverberation of noisy speech using one microphone , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[28]  Joseph Picone,et al.  Signal modeling techniques in speech recognition , 1993, Proc. IEEE.

[29]  Zachary M. Smith,et al.  Chimaeric sounds reveal dichotomies in auditory perception , 2002, Nature.

[30]  R. Plomp,et al.  Effect of temporal envelope smearing on speech reception. , 1994, The Journal of the Acoustical Society of America.

[31]  George Eastman House,et al.  Sparse Bayesian Learning and the Relevance Vector Machine , 2001 .

[32]  Misha Pavel,et al.  Intelligibility of speech with filtered time trajectories of spectral envelopes , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.