A novel visualization tool for manual annotation when building large speech corpora

A novel visualized sound description, called sound dendrogram is proposed to make manual annotation easier whenbuilding large speech corpora. It is a lattice structure built from a group of “seed regions” and through an iteractive procedure of mergence. A simple but reliable extraction method of “seed regions” and advanced distance metric are adopted to construct the sound dendrogram, so that it can present speech’s structure character ranging from coarse to fine in a visualized way. Tests show that all phonemic boundaries are contained in the lattice structure of sound dendrogram and very easy to identify. Sound dendrogram can be a powerful assistant tool during the process of speech corpora’s manual annotation.

[1]  Jean-Luc Husson Evaluation of a segmentation system based on multi-level lattices , 1999, EUROSPEECH.

[2]  Manish Sharma,et al.  "Blind" speech segmentation: automatic segmentation of speech without linguistic knowledge , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[3]  Yves Laprie,et al.  A new search algorithm in segmentation lattices of speech signals , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[4]  I. N. Mann,et al.  Proceedings of the 6th European Conference on Speech Communication and Technology , 1999 .

[5]  Kris Demuynck,et al.  A Comparison of Different Approaches to Automatic Speech Segmentation , 2002, TSD.

[6]  Min Tang,et al.  Large vocabulary continuous speech recognition using linguistic features and constraints , 2005 .

[7]  S. Seneff A joint synchrony/mean-rate model of auditory speech processing , 1990 .

[8]  Douglas A. Reynolds,et al.  Corpora for the evaluation of speaker recognition systems , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[9]  Eric Keller,et al.  Fundamentals of speech synthesis and speech recognition: basic concepts, state-of-the-art and future challenges , 1995 .

[10]  Patrick Wambacq,et al.  An Improved Algorithm for the Automatic Segmentation of Speech Corpora , 2002, LREC.

[11]  Lluís Padró,et al.  Comparing methods for language identification , 2004, Proces. del Leng. Natural.

[12]  Stephanie Seneff A joint synchrony/mean-rate model of auditory speech processing , 1990 .

[13]  James R. Glass Finding acoustic regularities in speech: applications to phonetic recognition , 1988 .

[14]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.