Decomposition of the speech signal into short-time waveforms using spectral segmentation

Representation of the speech signal by a set of discrete elements which respect its acoustical and perceptive structures is considered. The signal is pre-analyzed frame by frame, and the spectral envelope obtained for each frame is segmented into regions comprising a single peak. The signal is then filtered in each region, and the elementary waveforms are spotted in the time domain. The problem of grouping the waveforms in adjacent channels is thus circumvented. The resulting representation is satisfactory, as is the signal reconstruction, except for some modeling problems remaining in the lowest part of the spectrum.<<ETX>>