Acoustic characteristics of speaker individuality: Control and conversion

Abstract This paper introduces some recent studies on voice quality control and conversion technologies. After briefly summarizing some basic scientific findings on the acoustic correlates of speech individuality, we review the latest developments in speech technologies related to voice control and speaker characteristic copying. The main focus is on a survey of non-parametric methods for spectral segmental characteristics mapping between speakers, introducing some different types of spectral mapping methods that have evolved in relation to the speaker adaptation techniques being developed in speech recognition research.

[1]  Xavier Rodet,et al.  Speech analysis and synthesis methods based on spectral envelopes and voiced/unvoiced functions , 1987, ECST.

[2]  Inger Karlsson,et al.  Female voices in speech synthesis , 1991 .

[3]  D. Graddol,et al.  Speaking Fundamental Frequency: Some Physical and Social Correlates , 1983, Language and speech.

[4]  Hiroshi Matsumoto,et al.  Unsupervised speaker adaptation from short utterances based on a minimized fuzzy objective function. , 1993 .

[5]  D. Childers,et al.  Acoustic correlates of vocal quality. , 1990, Journal of speech and hearing research.

[6]  Gunnar Fant,et al.  Acoustic Theory Of Speech Production , 1960 .

[7]  D. Childers,et al.  Gender recognition from speech. Part I: Coarse analysis. , 1991, The Journal of the Acoustical Society of America.

[8]  Torazo Suzuki,et al.  Power spectrum envelope (PSE) speech analysissynthesis system , 1988 .

[9]  N J Lass,et al.  Correlational study of speakers' heights, weights, body surface areas, and speaking fundamental frequencies. , 1978, The Journal of the Acoustical Society of America.

[10]  K. Harris,et al.  Laryngeal function in phonation and respiration , 1987 .

[11]  B. Walden,et al.  An evaluation of residue features as correlates of voice disorders. , 1987, Journal of communication disorders.

[12]  D. Klatt,et al.  Analysis, synthesis, and perception of voice quality variations among female and male talkers. , 1990, The Journal of the Acoustical Society of America.

[13]  Tohru Takagi,et al.  Acoustic parameters of voice individuality and voice-quality control by analysis-synthesis method , 1991, Speech Commun..

[14]  Inger Karlsson Voice source dynamics for female speakers , 1990, ICSLP.

[15]  Hiroshi Matsumoto,et al.  Voice quality conversion based on supervised/unsupervised spectral mapping , 1994 .

[16]  Yoshinori Sagisaka,et al.  Speech spectrum conversion based on speaker interpolation and multi-functional representation with weighting by radial basis function networks , 1995, Speech Commun..

[17]  David Malah,et al.  Time-domain algorithms for harmonic bandwidth reduction and time scaling of speech signals , 1979 .

[18]  Michael Savic,et al.  Voice personality transformation , 1991, Digit. Signal Process..

[19]  Inger Karlsson Modelling voice variations in female speech synthesis , 1992, Speech Commun..

[20]  P H Milenkovic Voice source model for continuous control of pitch period. , 1993, The Journal of the Acoustical Society of America.

[21]  K. Shikano,et al.  Voice conversion through vector quantization , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[22]  Hisao Kuwabara A pitch-synchronous analysis/synthesis system to independently modify formant frequencies and bandwidths for voiced speech , 1984, Speech Commun..

[23]  Inger Karlsson Glottal wave forms for normal female speakers , 1986 .

[24]  Sadaoki Furui,et al.  Research of individuality features in speech waves and automatic speaker recognition techniques , 1986, Speech Commun..

[25]  Gunnar Fant,et al.  Some problems in voice source analysis , 1993, Speech Commun..

[26]  Donald G. Childers,et al.  Modeling vocal disorders via formant synthesis , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[27]  Perry R. Cook,et al.  SPASM, a Real-Time Vocal Tract Physical Model Controller; and Singer, the Companion Software Synthesis System , 1993 .

[28]  Joseph P. Olive Mixed spectral representation—Formants and linear predictive coding , 1992 .

[29]  Hideki Kasuya,et al.  Preliminary experiments on voice screening , 1986 .

[30]  D G Childers,et al.  Gender recognition from speech. Part II: Fine analysis. , 1991, The Journal of the Acoustical Society of America.

[31]  Hideki Kasuya,et al.  An acoustic analysis of pathological voice and its application to the evaluation of laryngeal pathology , 1986, Speech Commun..

[32]  Yasuhisa Niimi,et al.  Speaker-adaptation of a code book of vector quantization , 1988, ECST.

[33]  D E Hartman,et al.  Perceptual features of speech for males in four perceived age decades. , 1976, The Journal of the Acoustical Society of America.

[34]  Iain R. Murray,et al.  Toward the simulation of emotion in synthetic speech: a review of the literature on human vocal emotion. , 1993, The Journal of the Acoustical Society of America.

[35]  R Carlson,et al.  Models of speech synthesis. , 1994, Proceedings of the National Academy of Sciences of the United States of America.

[36]  Kiyohiro Shikano,et al.  Speaker adaptation through vector quantization , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[37]  H. Kuwabara,et al.  Contributions of pitch, formant frequency and bandwidth to the perception of voice-personality , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[38]  Eric Moulines,et al.  Voice transformation using PSOLA technique , 1991, Speech Commun..

[39]  Thomas F. Quatieri,et al.  Shape invariant time-scale and pitch modification of speech , 1992, IEEE Trans. Signal Process..

[40]  S. Hiki,et al.  Multidimensional representation of personal quality of vowels and its acoustical correlates , 1973 .

[41]  M. F. Schwartz,et al.  Identification of speaker sex from isolated, whispered vowels. , 1968, The Journal of the Acoustical Society of America.

[42]  Tetsuo Kosaka,et al.  Rapid speaker adaptation using speaker-mixture allophone models applied to speaker-independent speech recognition , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[43]  M. Abe,et al.  A new speech modification method by signal reconstruction , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[44]  D G Childers,et al.  Vocal quality factors: analysis, synthesis, and perception. , 1991, The Journal of the Acoustical Society of America.