Unsupervised word segmentation from noisy input

In this paper we present an algorithm for the unsupervised segmentation of a character or phoneme lattice into words. Using a lattice at the input rather than a single string accounts for the uncertainty of the character/phoneme recognizer about the true label sequence. An example application is the discovery of lexical units from the output of an error-prone phoneme recognizer in a zero-resource setting, where neither the lexicon nor the language model is known. Recently a Weighted Finite State Transducer (WFST) based approach has been published which we show to suffer from an issue: language model probabilities of known words are computed incorrectly. Fixing this issue leads to greatly improved precision and recall rates, however at the cost of increased computational complexity. It is therefore practical only for single input strings. To allow for a lattice input and thus for errors in the character/phoneme recognizer, we propose a computationally efficient suboptimal two-stage approach, which is shown to significantly improve the word segmentation performance compared to the earlier WFST approach.

[1]  Bhiksha Raj,et al.  Unsupervised Word Discovery from Phonetic Input Using Nested Pitman-Yor Language Modeling , 2013, ICRA 2013.

[2]  Janet M. Baker,et al.  The Design for the Wall Street Journal-based CSR Corpus , 1992, HLT.

[3]  James R. Glass,et al.  Unsupervised Pattern Discovery in Speech , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[4]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[5]  James R. Glass,et al.  Zero resource spoken audio corpus analysis , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[6]  Hugo Van hamme,et al.  Improving the multigram algorithm by using lattices as input , 2008, INTERSPEECH.

[7]  James R. Glass Towards unsupervised speech processing , 2012, 2012 11th International Conference on Information Science, Signal Processing and their Applications (ISSPA).

[8]  Kenneth Ward Church,et al.  A summary of the 2012 JHU CLSP workshop on zero resource speech technologies and models of early language acquisition , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[9]  Naonori Ueda,et al.  Bayesian Unsupervised Word Segmentation with Nested Pitman-Yor Language Modeling , 2009, ACL.

[10]  Bert Cranen,et al.  A computational model for unsupervised word discovery , 2007, INTERSPEECH.

[11]  Tatsuya Kawahara,et al.  Bayesian Learning of a Language Model from Continuous Speech , 2012, IEICE Trans. Inf. Syst..

[12]  Bhiksha Raj,et al.  Unsupervised Structure Discovery for Semantic Analysis of Audio , 2012, NIPS.

[13]  Yee Whye Teh,et al.  A Bayesian Interpretation of Interpolated Kneser-Ney , 2006 .

[14]  Reinhold Häb-Umbach,et al.  Unsupervised Learning of Acoustic Events Using Dynamic Time Warping and Hierarchical K-Means++ Clustering , 2011, INTERSPEECH.