A Novel Initialization Method for Unsupervised Learning of Acoustic Patterns in Speech (FGNT-2013-01)

In this paper we present a novel initialization method for unsupervised learning of acoustic patterns in recordings of continuous speech. The pattern discovery task is solved by dynamic time warping whose performance we improve by a smart starting point selection. This enables a more accurate discovery of patterns compared to conventional approaches. After graph-based clustering the patterns are employed for training hidden Markov models for an unsupervised speech acquisition. By iterating between model training and decoding in an EM-like framework the word accuracy is continuously improved. On the TIDIGITS corpus we achieve a word error rate of about 13 percent by the proposed unsupervised pattern discovery approach, which neither assumes knowledge of the acoustic units nor of the labels of the training data.