We present a model for the generation of low frequency human-like pitch deviation. We take f0 measurements from vocalists producing a 300Hz fixed tone without vibrato and find that smaller regions are evident, each with QuasiGaussian distributions. We present a function to implement this with a PSOLA pitch shifting algorithm, providing natural sounding enhancements to singing voice synthesis systems. 1 Background In the following study, Fundamental Frequency f0 control in the singing voice is investigated and a novel technique for the stochastic production of drift is suggested. Drift is the low frequency, involuntary modulation of the f0 during phonation. In singing this correlates with nonperiodic fluctuations occurring below 5hz, which in western music, sit just below the frequency range of vibrato. Drift has been linked closely with physiological and psychological aspects of speech production. Physiologically, Orlikoff and Bakenboth [1] suggest that laryngeal muscles and the beating of heart are influential in the stability of a singer’s f0 contour. Whereas, Burnett [2] suggests that auditory feedback also has a psychoacoustic influence on the mechanism of f0 control. At present, singing voice synthesis and transformation systems are being used extensively to generate new expressive musical instruments and effects [3][4].We have found that by adding low frequency modulation based on a human process, voices are less robotic and closer to that of an actual singer. Humanisation of this modulation technique ultimately provides enhancements to the field of singing synthesis by contributing to more natural sounding voices. 1.1 Previous methods Drift in singing voice synthesis is acknowledged in numerous papers, however most authors approach this as a minor addition to the synthesis procedure. Studies such as Macon’s [5] stress the requirements for human error in the f0 contour. This is reflected in the perceptual experiments undertaken by Saitou [6], in which the least natural sounding synthesized singing voices are those with a smoothed f0 contour. This smoothing removes the majority of involuntary pitch deviation. In order to re-synthesize this, a low pass filter is applied to a noise signal with Gaussian distribution. This is added to the waveform at the end of the process. Alternate methods for generating the low frequency modulation, used by both Lai [7] and Macon [5] are derived
[1]
Wen-Hsing Lai.
F0 Control Model for Mandarin Singing Voice Synthesis
,
2007,
2007 Second International Conference on Digital Telecommunications (ICDT'07).
[2]
Masashi Unoki,et al.
Development of an F0 control model based on F0 dynamic characteristics for singing-voice synthesis
,
2005,
Speech Commun..
[3]
Alex Loscos,et al.
Emulating Rough And Growl Voice In Spectral Domain
,
2004
.
[4]
Mark A. Clements,et al.
Concatenation-Based MIDI-to-Singing Voice Synthesis
,
1997
.
[5]
R. Orlikoff,et al.
Fundamental frequency modulation of the human voice by the heartbeat: preliminary results and possible mechanisms.
,
1989,
The Journal of the Acoustical Society of America.
[6]
Jordi Janer,et al.
TRANSFORMING SINGING VOICE EXPRESSION - THE SWEETNESS EFFECT
,
2004
.
[7]
C. Larson,et al.
Voice F0 responses to pitch-shifted auditory feedback: a preliminary study.
,
1997,
Journal of voice : official journal of the Voice Foundation.
[8]
E. R. Golder,et al.
The Box‐Müller Method for Generating Pseudo‐Random Normal Deviates
,
1976
.
[9]
D. Klatt,et al.
Analysis, synthesis, and perception of voice quality variations among female and male talkers.
,
1990,
The Journal of the Acoustical Society of America.
[10]
Xuejing Sun,et al.
Pitch determination and voice quality analysis using Subharmonic-to-Harmonic Ratio
,
2002,
2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.