A procedure for estimating gestural scores from articulatory data.

Speech can be represented as a set of discrete vocal tract constriction gestures (gestural score) defined at functionally distinct speech organs [tract variables (TVs)]. Using such gestures as sub‐word units in an ASR system, variation in speech arising from coarticulation and reduction can be addressed. Since there is a lack of test corpora annotated with gestural scores, we develop a semi‐automatic procedure for estimating and annotating gestural scores from natural speech databases using the Haskins speech production model (TaDA). We first describe the procedure’s application to 500 words with unique phone sets found in the Wisconsin x‐ray microbeam database, generating both gestural scores and synthetic speech. Second, we perform dynamic time warping (DTW) to align the TaDA‐generated speech signals with respect to the microbeam data. The DTW time‐scaling pattern is then used to adjust the gestural score originally input to TaDA to generate new time‐warped TaDA acoustics. Third, we fine‐tune the gestur...