On the Automatic Validation of Speech Alignment

The alignment of two utterances is the basis of many speech processing applications. The acoustic user interface of such applications should be capable of detecting insufficient alignment results and identifying the responsible input utterances. In this paper, we discuss the automatic validation of speech alignment and propose two new validation algorithms. The first method relies on locating and matching the syllable nuclei of the aligned utterances. The second method performs syllable-level comparison of the speech signal envelopes in accordance to the alignment time-warping path. Experimental results show that the proposed algorithms perform consistently well and can be effectively applied for the validation of different speech alignment methods.

[1]  Jagannath H. Nirmal,et al.  Novel approach of MFCC based alignment and WD-residual modification for voice conversion using RBF , 2017, Neurocomputing.

[2]  S. Chiba,et al.  Dynamic programming algorithm optimization for spoken word recognition , 1978 .

[3]  Werner Verhelst,et al.  Speech rate determination by vowel detection on the modulated energy envelope , 2014, 2014 22nd European Signal Processing Conference (EUSIPCO).

[4]  Sandrine Brognaux,et al.  HMM-Based Speech Segmentation: Improvements of Fully Automatic Approaches , 2016, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[5]  Sandrine Brognaux,et al.  Expressive speech synthesis : research and system design with hidden Markov models , 2015 .

[6]  Meinard Müller,et al.  Information retrieval for music and motion , 2007 .

[7]  Marc Moonen,et al.  Comparison of speech envelope extraction methods for EEG-based auditory attention detection in a cocktail party scenario , 2015, 2015 37th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC).

[8]  Benoit M. Macq,et al.  3D immersive karaoke for the learning of foreign language pronunciation , 2017, 2017 International Conference on 3D Immersion (IC3D).

[9]  Cecilia Jarne,et al.  Simple empirical algorithm to obtain signal envelope in three steps , 2017, ArXiv.

[10]  Haim Kaplan,et al.  Computing the Discrete Fréchet Distance in Subquadratic Time , 2012, SIAM J. Comput..

[11]  Frank Rudzicz Adjusting dysarthric speech signals to be more intelligible , 2013, Comput. Speech Lang..

[12]  S. Rosen Temporal information in speech: acoustic, auditory and linguistic aspects. , 1992, Philosophical transactions of the Royal Society of London. Series B, Biological sciences.

[13]  Thierry Dutoit,et al.  elite-HTS: a NLP tool for French HMM-based speech synthesis , 2014, INTERSPEECH.

[14]  Werner Verhelst,et al.  On split Dynamic Time Warping for robust Automatic Dialogue Replacement , 2012, Signal Process..

[15]  Axel Röbel,et al.  Syll-O-Matic: An adaptive time-frequency representation for the automatic segmentation of speech into syllables , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[16]  Emily Mower Provost,et al.  Modeling pronunciation, rhythm, and intonation for automatic assessment of speech quality in aphasia rehabilitation , 2014, INTERSPEECH.

[17]  Werner Verhelst,et al.  An iterative bilinear frequency warping approach to robust speaker-independent time synchronization , 2012, 2012 Proceedings of the 20th European Signal Processing Conference (EUSIPCO).

[18]  Jesper Jensen,et al.  An evaluation of objective measures for intelligibility prediction of time-frequency weighted noisy speech. , 2011, The Journal of the Acoustical Society of America.