Transforming Perceived Vocal Effort and Breathiness Using Adaptive Pre-Emphasis Linear Prediction

This paper presents a technique to transform high-effort voices into breathy voices using adaptive pre-emphasis linear prediction (APLP). The primary benefit of this technique is that it estimates a spectral emphasis filter that can be used to manipulate the perceived vocal effort. The other benefit of APLP is that it estimates a formant filter that is more consistent across varying voice qualities. This paper describes how constant pre-emphasis linear prediction (LP) estimates a voice source with a constant spectral envelope even though the spectral envelope of the true voice source varies over time. A listening experiment demonstrates how differences in vocal effort and breathiness are audible in the formant filter estimated by constant pre-emphasis LP. APLP is presented as a technique to estimate a spectral emphasis filter that captures the combined influence of the glottal source and the vocal tract upon the spectral envelope of the voice. A final listening experiment demonstrates how APLP can be used to effectively transform high-effort voices into breathy voices. The techniques presented here are relevant to researchers in voice conversion, voice quality, singing, and emotion.

[1]  Murray S. Miron,et al.  Effects of Vocal Effort upon the Consonant‐Vowel Ratio within the Syllable , 1957 .

[2]  J. Flanagan Speech Analysis, Synthesis and Perception , 1971 .

[3]  G. Allen Acoustic level and vocal effort as cues for the loudness of speech. , 1971, The Journal of the Acoustical Society of America.

[4]  T.H. Crystal,et al.  Linear prediction of speech , 1977, Proceedings of the IEEE.

[5]  A. Gray,et al.  Least squares glottal inverse filtering from the acoustic speech waveform , 1979 .

[6]  J. Laver The phonetic description of voice quality , 1980 .

[7]  J. Sundberg,et al.  The Science of Singing Voice , 1987 .

[8]  D. Klatt,et al.  Analysis, synthesis, and perception of voice quality variations among female and male talkers. , 1990, The Journal of the Acoustical Society of America.

[9]  Paavo Alku,et al.  Glottal wave analysis with Pitch Synchronous Iterative Adaptive Inverse Filtering , 1991, Speech Commun..

[10]  Dik J. Hermes,et al.  Synthesis of breathy vowels: Some research methods , 1991, Speech Commun..

[11]  D G Childers,et al.  Vocal quality factors: analysis, synthesis, and perception. , 1991, The Journal of the Acoustical Society of America.

[12]  Paavo Alku An automatic method to estimate the time-based parameters of the glottal pulseform , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[13]  Björn Granström,et al.  Neglected dimensions in speech synthesis , 1992, Speech Commun..

[14]  Joseph Picone,et al.  Signal modeling techniques in speech recognition , 1993, Proc. IEEE.

[15]  Christophe d'Alessandro,et al.  Analysis/synthesis and modification of the speech aperiodic component , 1996, Speech Commun..

[16]  Mark A. Clements,et al.  A singing voice synthesis system based on sinusoidal modeling , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[17]  Christophe d'Alessandro,et al.  Effectiveness of a periodic and aperiodic decomposition method for analysis of voice sources , 1998, IEEE Trans. Speech Audio Process..

[18]  J. Liénard,et al.  Effect of vocal effort on spectral properties of vowels. , 1999, The Journal of the Acoustical Society of America.

[19]  John H. L. Hansen,et al.  Methods for stress classification: nonlinear TEO and linear speech based features , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[20]  John H. L. Hansen,et al.  A comparative study of traditional and newly proposed features for recognition of speech under stress , 2000, IEEE Trans. Speech Audio Process..

[21]  H. Traunmüller,et al.  Acoustic effects of variation in vocal effort by men, women, and children. , 2000, The Journal of the Acoustical Society of America.

[22]  Unto K. Laine,et al.  A comparison of warped and conventional linear predictive coding , 2001, IEEE Trans. Speech Audio Process..

[23]  Mattias Heldner,et al.  Spectral emphasis as an additional source of information in accent detection , 2001 .

[24]  Laura Anne Bateman Soprano, style and voice quality: acoustic and laryngographic correlates , 2003 .

[25]  Ken-Ichi Sakakibara,et al.  THE EFFECT OF THE HYPOPHARYNGEAL AND SUPRA-GLOTTIC SHAPES ON THE SINGING VOICE , 2003 .

[26]  Peter Kabal Ill-conditioning and bandwidth expansion in linear prediction of speech , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[27]  Christophe d'Alessandro,et al.  The voice source as a causal/anticausal linear filter , 2003 .

[28]  Coarticulation • Suprasegmentals,et al.  Acoustic Phonetics , 2019, The SAGE Encyclopedia of Human Communication Sciences and Disorders.

[29]  P. Depalle,et al.  Adaptive processing of singing voice timbre , 2004, Canadian Conference on Electrical and Computer Engineering 2004 (IEEE Cat. No.04CH37513).

[30]  Hartmut R. Pfitzinger Influence of Differences between Inverse Filtering Techniques on the Residual Signal of Speech , 2005 .

[31]  K.I. Nordstrom,et al.  Using voice conversion as a paradigm for analyzing breathy singing voices , 2005, PACRIM. 2005 IEEE Pacific Rim Conference on Communications, Computers and signal Processing, 2005..

[32]  Nathalie Henrich Bernardoni,et al.  The spectrum of glottal flow models , 2006 .

[33]  K.I. Nordstrom,et al.  Influence of the LPC Filter Upon the Perception of Breathiness and Vocal Effort , 2006, 2006 IEEE International Symposium on Signal Processing and Information Technology.

[34]  Maria Södersten,et al.  Loud speech over noise: some spectral attributes, with gender differences. , 2006, The Journal of the Acoustical Society of America.

[35]  Anders Eriksson,et al.  Quarterly Progress and Status Report Cries and whispers : acoustic effects of variations of vocal effort , 2007 .

[36]  J. Liljencrants,et al.  Dept. for Speech, Music and Hearing Quarterly Progress and Status Report a Four-parameter Model of Glottal Flow , 2022 .