Discovering Words from Continuous Speech – A study of two factor analysis methods.

Modern speech recognizers rely on preprogrammed knowledge of specific languages and extensive examples including annotations. Children however are remarkably well-adapted to learning language without such help, learning from examples of the speech itself and from the environment in which they live. Modelling this learning process is a very interesting but also a very complex topic which encompasses not only speech but all the senses a child has at its disposal. In this thesis, a subset of this problem is studied, namely the discovery of words from continuous speech without prior knowledge of the language.Two different methods are used for this purpose. The basic premise of both methods is to find frequently repeating patterns in spoken utterances, and both methods approach this problem in a similar manner. Given a matrix of fixed-length representations of utterances, both methods decompose the matrix into a weighted linear combination of sparse vectors. The first method is a recently developed non-parametric Bayesian method for factor analysis, called Beta Process Factor Analysis (BPFA). This method is modified and applied to the problem of word discovery from continuous speech. The second method, Non-negative Matrix Factorization (NMF) has been previously applied for the same purpose and this method is used here as a reference.The new method has the advantage compared to NMF of being able to infer the size of the basis, and thereby also the number of recurring patterns, or word candidates, found in the data. Results obtained with BPFA, are compared with NMF on the TIDigits database, showing that the new method is capable of not only finding the correct words, but also the correct number of words. It is further demonstrated that the method can infer the approximate number of words for different vocabulary sizes by testing on randomly generated sequences of words.