Automatic correspondence calculation between text and speech for authoring digital talking book

The present paper proposes applying the voice-pause (VP) method to authoring DAISY talking books used by visually impaired people. The proposed method enables authors to automatically calculate the time information of sentence-based correspondence between Japanese text and the corresponding audio data, reducing the time required to perform searches. While there have been several related studies that calculate the time information of the correspondence, they require the input audio data to have a specific speech style and to be short in duration. Therefore, in the present paper, the proposed VP method was used to determine the average gap time and the sentence detection rate for databases having different speech styles and for input audio data having long durations. The experimental results show that the average gap time was approximately 0.38 sec and the sentence detection rate was approximately 94% and these are independent of speech style. The proposed VP method performs well and is efficient compared with methods proposed in previous studies.