论文信息 - Automatic correspondence calculation between text and speech for authoring digital talking book

Automatic correspondence calculation between text and speech for authoring digital talking book

The present paper proposes applying the voice-pause (VP) method to authoring DAISY talking books used by visually impaired people. The proposed method enables authors to automatically calculate the time information of sentence-based correspondence between Japanese text and the corresponding audio data, reducing the time required to perform searches. While there have been several related studies that calculate the time information of the correspondence, they require the input audio data to have a specific speech style and to be short in duration. Therefore, in the present paper, the proposed VP method was used to determine the average gap time and the sentence detection rate for databases having different speech styles and for input audio data having long durations. The experimental results show that the average gap time was approximately 0.38 sec and the sentence detection rate was approximately 94% and these are independent of speech style. The proposed VP method performs well and is efficient compared with methods proposed in previous studies.

Masahide Sugiyama | Katsuyuki Watanabe

[1] Hiroyuki Segi,et al. An Automatic Timing Detection Method Using Word Spotting and Dynamic Programming for Superimposing Captions in Television Programs , 2002 .

[2] Katsuhiko Shirai,et al. Automatic closed-caption production system on TV programs for hearing-impaired people , 2003, Systems and Computers in Japan.

[3] Ellen M. Voorhees,et al. The TREC Spoken Document Retrieval Track: A Success Story , 2000, TREC.

[4] N. Otsu. A threshold selection method from gray level histograms , 1979 .

[5] Masahide Sugiyama,et al. Automatic caption generation for video data. Time alignment between caption and acoustic signal , 1999, 1999 IEEE Third Workshop on Multimedia Signal Processing (Cat. No.99TH8451).

[6] Hiromasa Fujihara,et al. Automatic Synchronization between Lyrics and Music CD Recordings Based on Viterbi Alignment of Segregated Vocal Signals , 2006, Eighth IEEE International Symposium on Multimedia (ISM'06).

[7] P. Woodland,et al. RETRIEVAL FOR TREC-9 AT CAMBRIDGE UNIVERSITY , 2001 .

[8] 山下達雄. 用語解説「Suffix Array」 , 2000 .