Genetic learning of vocal tract area functions for articulatory synthesis of Spanish vowels

This paper describes a multipopulation real-coded genetic approach for recovering vocal tract area functions from speech data. The kind of data analyzed is a subset of Spanish speech signals, concretely vowels from Venezuelan SpeechDat database of utterances, increasing novelty of the study. The method evolves parametric representations of speech articulators, with the goal set to minimizing acoustic distance respect to target, natural SpeechDat utterances. This distance is based on signal's formants and a measure of continuity of the area function. Subsequently, best learned functions are provided as input to an articulatory speech synthesizer, in order to generate artificial utterances, potentially and acoustically similar to the natural signals. Objective and subjective tests on these artificial signals have positively verified effectiveness of the genetic approach.

[1]  Gunnar Fant,et al.  Acoustic Theory Of Speech Production , 1960 .

[2]  M. Senthil Arumugam,et al.  New hybrid genetic operators for real coded genetic algorithm to compute optimal control of a class of hybrid systems , 2005, Appl. Soft Comput..

[3]  Akira Oyama,et al.  Real-Coded Adaptive Range Genetic Algorithm Applied to Transonic Wing Optimization , 2000, PPSN.

[4]  Abraham Kandel,et al.  Similarity of dynamical systems , 1998 .

[5]  Qiguang Lin Speech production theory and articulatory speech synthesis , 1991 .

[6]  Christine M. Anderson-Cook Practical Genetic Algorithms (2nd ed.) , 2005 .

[7]  Elliot Saltzman,et al.  The dynamical perspectives on speech production: Data and theory , 1986 .

[8]  J. Flanagan Speech Analysis, Synthesis and Perception , 1971 .

[9]  Elliot Saltzman,et al.  Task Dynamic Coordination of the Speech Articulators: A Preliminary Model , 1986 .

[10]  Darrell Whitley,et al.  A genetic algorithm tutorial , 1994, Statistics and Computing.

[11]  Juan Carlos,et al.  Review of "Discrete-Time Speech Signal Processing - Principles and Practice", by Thomas Quatieri, Prentice-Hall, 2001 , 2003 .

[12]  Ronald W. Schafer,et al.  Digital Processing of Speech Signals , 1978 .

[13]  P. Denes,et al.  The speech chain : the physics and biology of spoken language , 1963 .

[14]  Hani Yehia,et al.  A method to combine acoustic and morphological constraints in the speech production inverse problem , 1996, Speech Commun..

[15]  Man Mohan Sondhi,et al.  Techniques for estimating vocal-tract shapes from the speech signal , 1994, IEEE Trans. Speech Audio Process..

[16]  Lashon B. Booker,et al.  Proceedings of the fourth international conference on Genetic algorithms , 1991 .

[17]  Abraham Kandel,et al.  A fuzzy information space approach to speech signal non‐linear analysis , 2000 .

[18]  Randy L. Haupt,et al.  Practical Genetic Algorithms , 1998 .

[19]  Saeed Bagheri Shouraki,et al.  Recognition of human speech phonemes using a novel fuzzy approach , 2007, Appl. Soft Comput..

[20]  Qin Yan,et al.  Formant-tracking linear prediction models for speech processing in noisy environments , 2005, INTERSPEECH.

[21]  L. Darrell Whitley,et al.  GENITOR II: a distributed genetic algorithm , 1990, J. Exp. Theor. Artif. Intell..

[22]  Zbigniew Michalewicz,et al.  An Experimental Comparison of Binary and Floating Point Representations in Genetic Algorithms , 1991, ICGA.

[23]  D H Klatt,et al.  Review of text-to-speech conversion for English. , 1987, The Journal of the Acoustical Society of America.

[24]  Richard S. McGowan,et al.  Recovering articulatory movement from formant frequency trajectories using task dynamics and a genetic algorithm: Preliminary model tests , 1994, Speech Commun..

[25]  Francis F. Li,et al.  A neural network model for speech intelligibility quantification , 2007, Appl. Soft Comput..