Robust Tracking for Automatic Reading Tutors

Reading tutor software uses automatic speech recognition technology to support children in developing their reading skills. In many forms of exercise and evaluation, tracking the reading position is a relevant task or even a prerequisite, e.g. to provide assistance on the pronunciation of a word or to advance the screen to the next page. In this paper, we introduce a new robust tracking algorithm, which measures the similarity between the recognized phones and the phonetic transcription of words displayed on a screen using an efficient dynamic programming algorithm. The criteria for accepting a word reading attempt and thus advancing the cursor can hence be expressed phonetically. In addition, the most likely state of the Hidden Markov Model (HMM) used to decode the speech serves as a fallback for cases of phone matching failure. The new tracker’s performance is compared with two other trackers which use either the most likely HMM state or phone matching. The evaluation metrics quantify both the frequency of timely movements and loss of tracking synchronicity. The proposed approach performs significantly better than the others achieving a Timing Accuracy of Tracking of 81.03% compared to 50.63% of the phone matching approach and 32.36% of the state-based approach.