An adaptive unsupervised clustering of pronunciation errors for automatic pronunciation error detection

This paper expands the standard pronunciation space (SPS) to include pronunciation errors for automatic pronunciation error detection (APED), uses HMMs to represent the different distributions of pronunciation errors, proposes an adaptive unsupervised clustering of pronunciation errors based on the similarity measures between two HMMs, and then refines more detailed acoustic models for APED within the extended pronunciation space (EPS). The experimental results show that, the EPS based APED using the adaptive unsupervised clustering has better performance than the baseline system and the average scoring error rate (ASER) decreases from 0.415 to 0.302, relatively reducing by 27.23%. In the meantime, we also discuss the relationship between the number of clusters and the performance of the APED, and the update strategy of the models using the unlabeled pronunciation errors.