A non-uniform real-time speech time-scale stretching method

An algorithm for non-uniform real-time speech stretching is presented. It provides a combination of typical SOLA algorithm (Synchronous Overlap and Add) with the vowels, consonants and silence detectors. Based on the information about the content and the estimated value of the rate of speech (ROS), the algorithm adapts the scaling factor value. The ability of real-time speech stretching and the resultant quality of voice were analysed. Subjective tests were performed in order to compare the quality of the proposed method with the output of the standard SOLA algorithm. Accuracy of the ROS estimation was assessed to prove its robustness.

[1]  Thilo Pfau,et al.  Estimating the speaking rate by vowel detection , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[2]  Werner Verhelst,et al.  Efficient non-uniform time-scaling of speech with WSOLA for CALL applications , 2004 .

[3]  Horacio Franco,et al.  RATE-OF-SPEECH MODELING FOR LARGE VOCABULARY CONVERSATIONAL SPEECH RECOGNITION , 2003 .

[4]  Christophe d'Alessandro,et al.  Issues and Solutions Related to Real-Time TD-PSOLA Implementation , 2010 .

[5]  Eric Moulines,et al.  Non-parametric techniques for pitch-scale and time-scale modification of speech , 1995, Speech Commun..

[6]  Steven L. Miller,et al.  Language Comprehension in Language-Learning Impaired Children Improved with Acoustically Modified Speech , 1996, Science.

[7]  Werner Verhelst,et al.  Efficient non-uniform time-scaling of speech , 2004 .

[8]  Werner Verhelst,et al.  An overlap-add technique based on waveform similarity (WSOLA) for high quality time-scale modification of speech , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[9]  Jing Zheng,et al.  Word-level rate of speech modeling using rate-specific phones and pronunciations , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[10]  Mohammad Hossein Moattar,et al.  A new approach for robust realtime Voice Activity Detection using spectral pattern , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[11]  Yizhar Lavner,et al.  Time-Scale Modification of Audio Signals Using Enhanced WSOLA With Management of Transients , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[12]  Eric Fosler-Lussier,et al.  Combining multiple estimators of speaking rate , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[13]  Eric Fosler-Lussier,et al.  Towards robustness to fast speech in ASR , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[14]  Andrzej Czyzewski,et al.  Real-Time Speech-Rate Modification Experiments , 2010 .

[15]  Andrzej Czyzewski,et al.  Time-scale modification of speech signals for supporting hearing impaired schoolchildren , 2009, Signal Processing Algorithms, Architectures, Arrangements, and Applications SPA 2009.

[16]  Yizhar Lavner,et al.  Time-scale modification of music signals , 2002, The 22nd Convention on Electrical and Electronics Engineers in Israel, 2002..

[17]  Shrikanth S. Narayanan,et al.  Speech rate estimation via temporal correlation and selected sub-band correlation , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..