Stochastic perceptual speech models with durational dependence

In (Morgan et al., 1994), we developed a statistical model of speech recognition where emphasis was placed on the perceptually-relevant and information-rich portion of the speech signal. In that model, speech is viewed as a sequence of elementary decisions or auditory events (avents) that are made in response to loci of significant spectral change. These decision points are interleaved with periods during which insufficient information has been accumulated to make the next decision. We have called this a stochastic perceptual avent model, or SPAM. In the work reported, we have extended our initial experimental implementation to include other probabilistic dependencies specified in the original theory, particularly the dependence on the time from the current frame back to the previous hypothesized avent.