A dynamic tracking model of spectral shape: The sequential grouping based on timbre

This study proposes a new model for dynamically tracking spectral shape change. In previous studies on computational implementation of auditory scene analysis, the sequential grouping has not been fully realized. Since the new model can track the spectral shape, it is able to make the sequential grouping based on timbre feasible. First, a spectral envelope is converted into a series of frequencies by using the IFIS (inverse function of integrated spectrum [Ohmuro et al., Tech. Rep. Speech Acoust. Soc. Jpn. SP89‐72 (1992) (in Japanese)]. IFIS has good interpolation and extrapolation characteristics. Spectral shape is represented as a set of frequencies on the IFIS axis. Next, the frequencies are tracked with the FM‐tracking model [Aikawa etal ., J. Acoust. Soc. Am. 98, 2926(A) (1995)]. Aikawa et al.’s model represents the perceptual characteristic of sweep tones and is described by a second‐order AR model. Finally, the frequencies are again converted into a spectrum. Furthermore, this new model has an addi...