Evaluation of Speech Balloon Captions for Auditory Information Support in Small Meetings

This paper addresses information support for hearing-impaired people. Automatic speech recognition, which converts speech to text, is promising support for hearing-impaired people, and studies such include automatic captioning for TV programs or the automatic transcription of oral presentations, lectures, and meetings. These studies mainly focused on how to recognize speech accurately without paying attention how to display the caption texts. The display of caption texts has not been a significant problem because a single speaker usually talks in TV news, oral presentations, or lectures. But, how to display caption texts easily so that who is talking can be understood is important in meetings in which more than one person participates. In TV programs or movies, caption text is just displayed on the bottom side of the screen. The display method, which we call “TV-type caption” in this paper, is inadequate for meetings because it is hard to understand who is talking. Accordingly, we propose a caption display system that shows caption texts with speech balloons near speaker faces based on automatic face detection and speech recognition. In this paper, we evaluate speech balloon captions and compare them with TV-type captions through a questionnaire for appearance, readability of caption text, and comprehension. We confirmed that speech balloon captions are adequate for appearance and comprehension when several speakers exist. TV-type captions are suitable for appearance and readability of caption text when a single speaker talks.