A New Cosine Series Antialiasing Function and its Application to Aliasing-Free Glottal Source Models for Speech and Singing Synthesis

We formulated and implemented a procedure to generate aliasing-free excitation source signals. It uses a new antialiasing filter in the continuous time domain followed by an IIR digital filter for response equalization. We introduced a cosine-series-based general design procedure for the new antialiasing function. We applied this new procedure to implement the antialiased Fujisaki-Ljungqvist model. We also applied it to revise our previous implementation of the antialiased Fant-Liljencrants model. A combination of these signals and a lattice implementation of the time varying vocal tract model provides a reliable and flexible basis to test fo extractors and source aperiodicity analysis methods. MATLAB implementations of these antialiased excitation source models are available as part of our open source tools for speech science.

[1]  J. Liljencrants,et al.  Dept. for Speech, Music and Hearing Quarterly Progress and Status Report a Four-parameter Model of Glottal Flow , 2022 .

[2]  I. Titze Nonlinear source-filter coupling in phonation: theory. , 2008, The Journal of the Acoustical Society of America.

[3]  P H Milenkovic Voice source model for continuous control of pitch period. , 1993, The Journal of the Acoustical Society of America.

[4]  F. Harris On the use of windows for harmonic analysis with the discrete Fourier transform , 1978, Proceedings of the IEEE.

[5]  Julius O. Smith,et al.  Alias-Free Digital Synthesis of Classic Analog Waveforms , 1996, ICMC.

[6]  Per Hedelin A glottal LPC-vocoder , 1984, ICASSP.

[7]  PAAVO ALKU,et al.  Glottal inverse filtering analysis of human voice production — A review of estimation and parameterization methods of the glottal excitation and their applications , 2011 .

[8]  Vesa Välimäki Discrete-time synthesis of the sawtooth waveform with reduced aliasing , 2005, IEEE Signal Processing Letters.

[9]  Abeer Alwan,et al.  A new voice source model based on high-speed imaging and its application to voice source estimation , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[10]  Albert H Nuttall Some Windows with Very Good Sidelobe Behavior; Application to Discrete Hilbert Transform. , 1980 .

[11]  D G Childers,et al.  Modeling the glottal volume-velocity waveform for three voice types. , 1995, The Journal of the Acoustical Society of America.

[12]  Vesa Välimäki,et al.  Antialiasing Oscillators in Subtractive Synthesis , 2007, IEEE Signal Processing Magazine.

[13]  Tomoki Toda,et al.  Aliasing-free implementation of discrete-time glottal source models and their applications to speech synthesis and F0 extractor evaluation , 2015, 2015 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA).

[14]  Anders Löfqvist,et al.  Toward a consensus on symbolic notation of harmonics, resonances, and formants in vocalization. , 2015, The Journal of the Acoustical Society of America.

[15]  R. Schafer,et al.  On the use of the I 0 -sinh window for spectrum analysis , 1980 .

[16]  Hideki Kawahara,et al.  Tandem-STRAIGHT: A temporally stable power spectral representation for periodic signals and applications to interference-free spectrum, F0, and aperiodicity estimation , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[17]  Hideki Kawahara,et al.  Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: Possible role of a repetitive structure in sounds , 1999, Speech Commun..

[18]  Hideki Kawahara,et al.  Temporally variable multi-aspect N-way morphing based on interference-free speech representations , 2013, 2013 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference.

[19]  Ken-Ichi Sakakibara,et al.  Physiological observations and synthesis of subharmonic voices , 2011 .

[20]  Heiga Zen,et al.  Using instantaneous frequency and aperiodicity detection to estimate F0 for high-quality speech synthesis , 2016, SSW.

[21]  Hideki Kawahara SparkNG: Interactive MATLAB Tools for Introduction to Speech Production, Perception and Processing Fundamentals and Application of the Aliasing-Free L-F Model Component , 2016, INTERSPEECH.

[22]  D. Slepian,et al.  Prolate spheroidal wave functions, fourier analysis and uncertainty — II , 1961 .

[23]  References , 1971 .

[24]  D. Slepian Prolate spheroidal wave functions, fourier analysis, and uncertainty — V: the discrete case , 1978, The Bell System Technical Journal.

[25]  H. Pollak,et al.  Prolate spheroidal wave functions, fourier analysis and uncertainty — III: The dimension of the space of essentially time- and band-limited signals , 1962 .

[26]  Hiroya Fujisaki,et al.  Estimation of voice source and vocal tract parameters based on ARMA analysis and a model for the Glottal source waveform , 1987, ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[27]  D. Klatt,et al.  Analysis, synthesis, and perception of voice quality variations among female and male talkers. , 1990, The Journal of the Acoustical Society of America.

[28]  A. Rosenberg Effect of glottal pulse shape on the quality of natural vowels. , 1969, The Journal of the Acoustical Society of America.

[29]  A. Nuttall Some windows with very good sidelobe behavior , 1981 .

[30]  Hiroya Fujisaki,et al.  Proposal and evaluation of models for the glottal source waveform , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.