Spectral Envelope Transformation in Singing Voice for Advanced Pitch Shifting

The aim of the present work is to perform a step towards more natural pitch shifting techniques in singing voice for its application in music production and entertainment systems. In this paper, we present an advanced method to achieve natural modifications when applying a pitch shifting process to singing voice by modifying the spectral envelope of the audio excerpt. To this end, an all-pole model has been selected to model the spectral envelope, which is estimated using a constrained non-linear optimization. The analysis of the global variations of the spectral envelope was carried out by identifying changes of the parameters of the model along with the changes of the pitch. With the obtained spectral envelope transformation functions, we applied our pitch shifting scheme to some sustained vowels in order to compare results with the same transformation made by using the Flex Pitch plugin of Logic Pro X and pitch synchronous overlap and add technique (PSOLA). This comparison has been carried out by means of both an objective and a subjective evaluation. The latter was done with a survey open to volunteers on our website.

[1]  Werner Verhelst,et al.  An overlap-add technique based on waveform similarity (WSOLA) for high quality time-scale modification of speech , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[2]  Emilio Molina,et al.  Dissonance Reduction In Polyphonic Audio Using Harmonic Reorganization , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[3]  Paul Boersma,et al.  On-line formant shifting as a function of F0 , 2009, INTERSPEECH.

[4]  William Dumouchel,et al.  Integrating a robust option into a multiple regression computing environment , 1992 .

[5]  F. Milinazzo,et al.  Formant location from LPC analysis data , 1993, IEEE Trans. Speech Audio Process..

[6]  Kazuyo Tanaka,et al.  Speech synthesis using a nonlinear energy damping model for the vocal folds vibration effect , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[7]  Sean A Fulop,et al.  Accuracy of formant measurement for synthesized vowels using the reassigned spectrogram and comparison with linear prediction. , 2010, The Journal of the Acoustical Society of America.

[8]  Abeer Alwan,et al.  Age, sex, and vowel dependencies of acoustic measures related to the voice source. , 2007, The Journal of the Acoustical Society of America.

[9]  Hideki Kawahara,et al.  YIN, a fundamental frequency estimator for speech and music. , 2002, The Journal of the Acoustical Society of America.

[10]  K. Johnson,et al.  Formants of children, women, and men: the effects of vocal intensity variation. , 1999, The Journal of the Acoustical Society of America.

[11]  Udo Zoelzer,et al.  DAFX: Digital Audio Effects , 2011 .

[12]  Eric Moulines,et al.  A diphone synthesis system based on time-domain prosodic modifications of speech , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[13]  Masashi Unoki,et al.  Development of an F0 control model based on F0 dynamic characteristics for singing-voice synthesis , 2005, Speech Commun..

[14]  Geoffrey S. Watson,et al.  Linear Least Squares Regression , 1967 .

[15]  J. C. Catford,et al.  A practical introduction to phonetics , 1988 .

[16]  Eric Moulines,et al.  Pitch-synchronous waveform processing techniques for text-to-speech synthesis using diphones , 1989, Speech Commun..

[17]  Peter Kabal,et al.  The computation of line spectral frequencies using Chebyshev polynomials , 1986, IEEE Trans. Acoust. Speech Signal Process..

[18]  A. Wilgus,et al.  High quality time-scale modification for speech , 1985, ICASSP '85. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[19]  Giovanni De Poli,et al.  Time‐segment Processing , 2004 .

[20]  Bishnu S. Atal,et al.  A new model of LPC excitation for producing natural-sounding speech at low bit rates , 1982, ICASSP.

[21]  Dennis H. Klatt,et al.  Software for a cascade/parallel formant synthesizer , 1980 .

[22]  Graham E. Poliner,et al.  Melody Transcription From Music Audio: Approaches and Evaluation , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[23]  X. Rodet EFFICIENT SPECTRAL ENVELOPE ESTIMATION AND ITS APPLICATION TO PITCH SHIFTING AND ENVELOPE PRESERVATION , 2005 .

[24]  Hideki Kenmochi,et al.  VOCALOID - commercial singing synthesizer based on sample concatenation , 2007, INTERSPEECH.

[25]  Francis Charpentier,et al.  Diphone synthesis using an overlap-add technique for speech waveforms concatenation , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[26]  Anssi Klapuri,et al.  Modelling of note events for singing transcription , 2004, SAPA@INTERSPEECH.

[27]  D. O'Shaughnessy,et al.  Linear predictive coding , 1988, IEEE Potentials.

[28]  Martin Heckmann,et al.  Combining Auditory Preprocessing and Bayesian Estimation for Robust Formant Tracking , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[29]  Jyh-Shing Roger Jang,et al.  A two-phase pitch marking method for TD-PSOLA synthesis , 2004, INTERSPEECH.

[30]  Emilio Molina,et al.  Parametric model of spectral envelope to synthesize realistic intensity variations in singing voice , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[31]  Thierry Dutoit,et al.  Glottal closure and opening instant detection from speech signals , 2019, INTERSPEECH.

[32]  Wesley Mattheyses,et al.  ROBUST PITCH MARKING FOR PROSODIC MODIFICATION OF SPEECH USING TD-PSOLA , 2006 .

[33]  Fayçal Ykhlef,et al.  Pitch Marking Using the Fundamental Signal for Speech Modifications via TDPSOLA , 2013, 2013 IEEE International Symposium on Multimedia.

[34]  Xavier Serra,et al.  Singing Voice Synthesis Combining Excitation plus Resonance and Sinusoidal plus Residual Models , 2001, ICMC.

[35]  Masataka Goto,et al.  VOCALISTENER: A SINGING-TO-SINGING SYNTHESIS SYSTEM BASED ON ITERATIVE PARAMETER ESTIMATION , 2009 .