Expediting the Process of EPUB 3 Book Generation
暂无分享,去创建一个
To generate a standardized audio book for the blind is a challenging task. Users with visual disabilities require navigation capability to peruse the book. As such, a full-audio, full-text format of the book would ensure this particular capability and would thus benefit the users the most. Unfortunately, there are a very limited number of full-audio-full-text books in Thai language. One of the means to expedite the production of this type of audio book is to utilize speech synthesis when text is readily available. Nonetheless, it is inevitable that there must be human intervention to describe non-text materials, such as graphs or figures. Therefore, this paper proposes a software program called "EPUB 3 Audio Book Generator" which produces EPUB audio books. This program takes an EPUB 3 text book without audio as an input. Then audios are synthesized by a Thai speech synthesis engine, CHULA-TTS, before being synchronized with texts using Media Overlays. Users are able to modify the synthesized audios to make it more accurate. In addition, to voice the non-text elements, our program allows users to insert images' captions so that the output audio book is complete. We experimented our software by using it to convert a chapter of a textbook from Engineering Drawing class in Thai and found the generation rate to be at approximately one hundred sentences per minutes. For the thirty errors caused by the synthesizer, it took ninety minutes to correct them. There were a total of ninety-nine images, forty of which had to be described which took about three hours to finish. Clearly, this newly developed software has a very promising future in shortening the process of generating a full-audio-full-text book for anyone in need.
[1] Natthawut Kertkeidkachorn,et al. CHULA TTS: A Modularized Text-To-Speech Framework , 2014, PACLIC.