ASR - articulatory speech recognition

The hidden Markov model (HMM) has proven to be the model which has made large-vocabulary automatic speech recognition (ASR) possible. The HMM is robust, versatile and has at its disposal a host of efficient algorithms which deal with training, speaker adaptation and recognition. However, there is nothing uniquely speech orientated about the HMM. In fact, certain assumptions are made of speech which are known to be untrue. For example, speech is modelled as a piecewise stationary process when we know it to be continuous. Also, co-articulation, which should be a rich source of information, simply provides unwanted variation. This variation is generally taken into account by modelling every phone in every context which in turn leads to problems of data sparcity, making elaborate parameter tying schemes necessary. Speech is generally modelled in a parametrised version of the acoustic domain, which is natural given that this is the data we have most ready access to. Any practical speech recogniser must of course take acoustic waveforms as input, however to take these in isolation from the production mechanism which created them ignores a rich source of prior knowledge. We propose that modelling speech in the articulatory domain using linear dynamic models (see section 4) will address some of these issues. The data here consists of trajectories which evolve smoothly over time, namely coordinates of points on the articulators. Effects such as coarticulation and assimilation are most simply described in articulatory terms, as opposed to in acoustic terms where they are confounded with the representation. Models that work in the articulatory domain are therefore able to explicitly model these phenomena. We have access to real articulatory data, collected by Alan Wrench at Queen Margaret College, Edinburgh (see [1] for further details). This has been used to train neural networks to recover articulatory traces from the acoustics. In our experiments we have used both real and automatically recovered articulation.