Automatic Pronunciation Error Detection Based on Extended Pronunciation Space Using the Unsupervised Clustering of Pronunciation Errors

Calculating posterior probability within a standard pronunciation space (SPS) is a common method in automatic pronunciation error detection (APED). However, to pronunciation errors outside the SPS, this kind of methods can only give an approximate solution, that may be not right in many applications. This paper expands the SPS to include more pronunciation errors, introduces a Bhattacharyya distance based clustering of pronunciation errors, and thus refines more detailed acoustic models for APED within the extended pronunciation space (EPS). The relationship between the performance of APED system and the number of cluster or the size of the EPS is well studied. The experimental results show that, compared with the APED based on the SPS, the APED based on the EPS using adaptive unsupervised clustering of pronunciation errors can achieve a better performance and the average scoring error rate (ASER) decreases from 0.412 to 0.301, relatively reducing by 26.94%.