Acoustic Modelling for Speech Processing in Complex Environments

Automatic Speech Recognition (ASR) is one of the classical multivariate statistical modelling applications that involves dealing with issues such as Acoustic Modelling (AM) or Language Modelling (LM). These tasks are generally very language-dependent and require very large resources. This work is focused on the selection of appropriate acoustic models for Speech Processing in a complex environment (a multilingual context in under-resourced and noisy conditions) oriented to general ASR tasks. The work has been carried out with a small trilingual speech database with very low audio quality. Thus, in order to decrease the negative impact that the lack of resources has in this task there have been selected two techniques: In the one hand, Hidden Markov Models have been enhanced using hybrid topologies and parameters as acoustic models of the sublexical units. In the other hand, an optimum configuration has been developed for the Acoustic Phonetic Decoding system, based on multivariate Gaussian numbers and the insertion penalty.