Extraction of Displayed Objects Corresponding to Demonstrative Words for Use in Remote Transcription

A previously proposed system for extracting target objects displayed during lectures by using demonstrative words and phrases and pointing gestures has now been evaluated. The system identifies pointing gestures by analyzing the trajectory of the stick pointer and extracts the objects to which the speaker points. The extracted objects are displayed on the transcriber's monitor at a remote location, thereby helping the transcriber to translate the demonstrative word or phrase into a short description of the object. Testing using video of an actual lecture showed that the system had a recall rate of 85.7% and precision of 84.8%. Testing using two extracted scenes showed that transcribers replaced significantly more demonstrative words with short descriptions of the target objects when the extracted objects were displayed on the transcriber's screen. A transcriber using this system can thus transcribe speech more easily and produce more meaningful transcriptions for hearing-impaired listeners.