Speech spectral envelope estimation through explicit control of peak evolution in time

This work proposes a new approach to estimating the speech spectral envelope that is adapted for applications requiring time-varying spectral modifications, such as Voice Conversion. In particular, we represent the spectral envelope as a sum of peaks that evolve smoothly in time, within a phoneme. Our representation provides a flexible model for the spectral envelope that pertains relevantly to human speech production and perception. We highlight important properties of the proposed spectral envelope estimation, as applied to natural speech, and compare results with those from a more traditional frame-by-frame cepstrum-based analysis. Subjective evaluations and comparisons of synthesized speech quality, as well as implications of this work in future research are also discussed.