Radial Basis Function Networks for Conversion of Sound Spectra

In many advanced signal processing tasks, such as pitch shifting, voice conversion or sound synthesis, accurate spectral processing is required. Here, the use of Radial Basis Function Networks (RBFN) is proposed for the modeling of the spectral changes (or conversions) related to the control of important sound parameters, such as pitch or intensity. The identification of such conversion functions is based on a procedure which learns the shape of the conversion from few couples of target spectra from a data set. The generalization properties of RBFNs provides for interpolation with respect to the pitch range. In the construction of the training set, mel-cepstral encoding of the spectrum is used to catch the perceptually most relevant spectral changes. Moreover, a singular value decomposition (SVD) approach is used to reduce the dimension of conversion functions. The RBFN conversion functions introduced are characterized by a perceptually-based fast training procedure, desirable interpolation properties and computational efficiency.

[1]  Shang-Liang Chen,et al.  Orthogonal least squares learning algorithm for radial basis function networks , 1991, IEEE Trans. Neural Networks.

[2]  Thomas F. Quatieri,et al.  Shape invariant time-scale and pitch modification of speech , 1992, IEEE Trans. Signal Process..

[3]  Carlo Drioli,et al.  SYMBOLIC AND AUDIO PROCESSING TO CHANGE THE EXPRESSIVE INTENTION OF A RECORDED MUSIC PERFORMANCE , 1999 .

[4]  James W. Beauchamp,et al.  Synthesis of Trumpet Tones Using a Wavetable and a Dynamic Filter , 1995 .

[5]  Sergio Canazza,et al.  Analysis by synthesis of the expressive intentions in musical performance , 1997, ICMC.

[6]  Xavier Serra,et al.  SaxEx: a case-based reasoning system for generating expressive musical performances , 1998, ICMC.

[7]  Xavier Serra,et al.  Musical Sound Modeling with Sinusoids plus Noise , 1997 .

[8]  S. Hyakin,et al.  Neural Networks: A Comprehensive Foundation , 1994 .

[9]  Eric Moulines,et al.  Continuous probabilistic transform for voice conversion , 1998, IEEE Trans. Speech Audio Process..

[10]  Thomas F. Quatieri,et al.  Speech analysis/Synthesis based on a sinusoidal representation , 1986, IEEE Trans. Acoust. Speech Signal Process..

[11]  O. Cappé,et al.  Regularized estimation of cepstrum envelope from discrete frequency points , 1995, Proceedings of 1995 Workshop on Applications of Signal Processing to Audio and Accoustics.

[12]  E. Zwicker,et al.  Analytical expressions for critical‐band rate and critical bandwidth as a function of frequency , 1980 .