A comparison of L1 and african-mother-tongue acoustic models for south african English speech recognition

Speaker accent influences the performance of automatic speech recognition (ASR) systems. Knowledge of accent based acoustic variations can therefore be used in the development of more robust systems. The goal of this project is to characterize the vowels and diphthongs of second language (L2) South African English to aid in the adaptation of existing first language (L1) English recognition systems for better L2 performance. This paper investigates the differences between the vowels and diphthongs of L1 and L2 English in South Africa and is specifically aimed at L2 English speakers with a native African mother tongue for instance isi-Xhosa, isi-Zulu, Tswana or Sepedi. The vowel systems of English, and African languages, as described in the linguistic literature, are compared to predict the expected deviations of L2 South African English from the L1 norm. Acoustic models based on formant and Mel-scaled cepstral features of 80 context dependent phonemes from L1 and L2 speakers are compared. Our findings agree well with those linguistically predicted, in particular, evidence of equivalence-classification, peripheralization of schwa and changes in diphthong strength are observed.