Train&align: A new online tool for automatic phonetic alignment

Several automatic phonetic alignment tools have been proposed in the literature. They usually rely on pre-trained speaker-independent models to align new corpora. Their drawback is that they cover a very limited number of languages and might not perform properly for different speaking styles. This paper presents a new tool for automatic phonetic alignment available online. Its specificity is that it trains the model directly on the corpus to align, which makes it applicable to any language and speaking style. Experiments on three corpora show that it provides results comparable to other existing tools. It also allows the tuning of some training parameters. The use of tied-state triphones, for example, shows further improvement of about 1.5% for a 20 ms threshold. A manually-aligned part of the corpus can also be used as bootstrap to improve the model quality. Alignment rates were found to significantly increase, up to 20%, using only 30 seconds of bootstrapping data.

[1]  Sandra Schwab,et al.  Easyalign Spanish: An (Semi-)Automatic Segmentation Tool Under Praat , 2014 .

[2]  Frank Dellaert,et al.  Recognizing emotion in speech , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[3]  Andrej Ljolje,et al.  Automatic speech segmentation for concatenative inventory selection , 1994, SSW.

[4]  Vincent Colotte,et al.  Linguistic features weighting for a text-to-speech system without prosody model , 2005, INTERSPEECH.

[5]  Mark Liberman,et al.  Speaker identification on the SCOTUS corpus , 2008 .

[6]  Jean-Philippe Goldman,et al.  EasyAlign: An Automatic Phonetic Alignment Tool Under Praat , 2011, INTERSPEECH.

[7]  Tomoki Toda,et al.  An evaluation of automatic phone segmentation for concatenative speech synthesis , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[8]  Paul Boersma,et al.  Praat: doing phonetics by computer , 2003 .

[9]  Jordi Adell,et al.  Comparative study of automatic phone segmentation methods for TTS , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[10]  Florian Schiel,et al.  The Production of Speech Corpora , 2012 .

[11]  Piero Cosi,et al.  A preliminary statistical evaluation of manual and automatic segmentation discrepancies , 1991, EUROSPEECH.

[12]  Kiyohiro Shikano,et al.  Julius - an open source real-time large vocabulary recognition engine , 2001, INTERSPEECH.

[13]  Daniel Hirst,et al.  SPeech Phonetization Alignment and Syllabification (SPPAS): a tool for the automatic analysis of speech prosody , 2012 .

[14]  Jonas Beskow,et al.  Wavesurfer - an open source speech tool , 2000, INTERSPEECH.