A genetic approach for recovering vocal tract area functions from Spanish vowels

This paper shows how a genetic algorithm is able to recover vocal tract area functions from natural utterances. The kind of data analyzed is a subset of Spanish speech signals, concretely vowels from Venezuelan SpeechDat database of utterances, increasing novelty of the study. The method evolves parametric and real-coded representations of speech articulators, with the goal set to minimizing acoustic distance respect to the target, natural SpeechDat utterances. This distance is based on signal's formants and a measure of continuity of the area function. Furthermore, the genetic algorithm is implemented by using the multipopulation approach, seeking to accelerate convergence to a solution while keeping genetic diversity. Subsequently, best learned functions are provided as input to an articulatory speech synthesizer, in order to generate artificial utterances, potentially and acoustically similar to the natural signals. Objective and subjective tests on these artificial signals have positively verified effectiveness of the genetic approach.

[1]  Hani Yehia,et al.  A method to combine acoustic and morphological constraints in the speech production inverse problem , 1996, Speech Commun..

[2]  Man Mohan Sondhi,et al.  Techniques for estimating vocal-tract shapes from the speech signal , 1994, IEEE Trans. Speech Audio Process..

[3]  Miguel Á. Carreira-Perpiñán,et al.  Continuous latent variable models for dimensionality reduction and sequential data reconstruction , 2001 .

[4]  Zbigniew Michalewicz,et al.  An Experimental Comparison of Binary and Floating Point Representations in Genetic Algorithms , 1991, ICGA.

[5]  J. Flanagan Speech Analysis, Synthesis and Perception , 1971 .

[6]  Richard S. McGowan,et al.  Recovering articulatory movement from formant frequency trajectories using task dynamics and a genetic algorithm: Preliminary model tests , 1994, Speech Commun..

[7]  Abraham Kandel,et al.  Similarity of dynamical systems , 1998 .

[8]  Darrell Whitley,et al.  A genetic algorithm tutorial , 1994, Statistics and Computing.

[9]  Abraham Kandel,et al.  A fuzzy information space approach to speech signal non-linear analysis , 2000, Int. J. Intell. Syst..

[10]  P. Denes,et al.  The speech chain : the physics and biology of spoken language , 1963 .

[11]  Abraham Kandel,et al.  A fuzzy information space approach to speech signal non‐linear analysis , 2000 .

[12]  Qiguang Lin Speech production theory and articulatory speech synthesis , 1991 .

[13]  Helmuth Ploner-Bernard Speech Synthesis by Articulatory Models , 2003 .

[14]  Thomas Quatieri,et al.  Discrete-Time Speech Signal Processing: Principles and Practice , 2001 .

[15]  Gunnar Fant,et al.  Acoustic Theory Of Speech Production , 1960 .

[16]  Elliot Saltzman,et al.  The dynamical perspectives on speech production: Data and theory , 1986 .

[17]  L. Darrell Whitley,et al.  GENITOR II: a distributed genetic algorithm , 1990, J. Exp. Theor. Artif. Intell..

[18]  Elliot Saltzman,et al.  Task Dynamic Coordination of the Speech Articulators: A Preliminary Model , 1986 .