A NOVEL INITIALIZATION METHOD FOR UNSUPERVISED LEARNING OF ACOUSTIC PATTERNS IN SPEECH DEPARTMENT OF COMMUNICATIONS ENGINEERING TECHNICAL REPOR T FGNT-2013-01

In this paper we present a novel initialization method for un supervised learning of acoustic patterns in recordings of con tinuous speech. The pattern discovery task is solved by dynamic time warping whose performance we improve by a smart starting poi nt selection. This enables a more accurate discovery of patter ns compared to conventional approaches. After graph-based clust ering the patterns are employed for training hidden Markov models for an unsupervised speech acquisition. By iterating between model training and decoding in an EM-like framework the word accuracy is con tinuously improved. On the TIDIGITS corpus we achieve a word err or rate of about 13% by the proposed unsupervised pattern disco very approach, which neither assumes knowledge of the acoustic u nits nor of the labels of the training data.

[1]  Mi Zhou,et al.  A segment-wise time warping method for time scaling searching , 2005, Inf. Sci..

[2]  M E J Newman,et al.  Finding and evaluating community structure in networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[3]  Bhiksha Raj,et al.  Unsupervised Learning of Acoustic Unit Descriptors for Audio Content Representation and Classification , 2011, INTERSPEECH.

[4]  James R. Glass,et al.  Unsupervised spoken keyword spotting via segmental DTW on Gaussian posteriorgrams , 2009, 2009 IEEE Workshop on Automatic Speech Recognition & Understanding.

[5]  James R. Glass,et al.  Unsupervised Pattern Discovery in Speech , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[6]  Reinhold Häb-Umbach,et al.  Unsupervised Learning of Acoustic Events Using Dynamic Time Warping and Hierarchical K-Means++ Clustering , 2011, INTERSPEECH.

[7]  Nuria Oliver,et al.  Partial sequence matching using an Unbounded Dynamic Time Warping algorithm , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[8]  Kris Demuynck,et al.  Discovering Phone Patterns in Spoken Utterances by , 2008 .