论文信息 - Speech Recognition Architectures for Multimedia Environments

Speech Recognition Architectures for Multimedia Environments

1 Abstract Computer workstations have recently become powerful enough to support speech recognition entirely in software, but speech recognizers still vary in their functionality, and each vendor offers their own programmatic interface. Developing recognition applications currently means writing to non-portable protocols. As new improved recognizers become available, such applications will need to be rewritten for new protocols. A recognition server can abstract such differences from client applications, while supporting the use of different classes of recognizers available today. This paper describes the design and implementation of an asynchronous recognition server; the asynchronous server allows a client application to continue operation and possibly attend to other input and output events while waiting for recognition to complete. Its internal architecture is based on an object-oriented engine application programming interface (API). The server is designed to support speaker-independent connectedspeech recognition and speaker-dependent, isolatedword recognition paradigms, while offering applications a consistent programmatic interface for accessing this functionality. As recognition engines with better performance arrive, they can be incorporated into to the server, via a standardized engine API, and automatically become available to all client applications without modification. 2 Motivations and Applications Today, sophisticated multimedia environments include not only graphical input but also audio input, and applications employing the use of speech recognition are beginning to appear in such environments. Unfortunately, while many of the state-of-the-art recognition systems offer similar functionality, they also have non

[1] Eric Thich Vi Ly,et al. Chatter--a conversational telephone agent , 1993 .

[2] Barry Arons,et al. Tools for building asynchronous servers to support speech and audio applications , 1992, UIST '92.

[3] Barry Arons,et al. VoiceNotes: a speech interface for a hand-held voice notetaker , 1993, INTERCHI.

[4] Lisa J. Stifelman. Not Just Another Voice Mail System , 1991 .

[5] Lynn Wilcox,et al. HMM-based wordspotting for voice editing and indexing , 1991, EUROSPEECH.

[6] Mark S. Ackerman,et al. Augmenting a window system with speech input , 1990, Computer.

[7] Debby Hindus,et al. Ubiquitous audio: capturing spontaneous collaboration , 1992, CSCW '92.

[8] C. Schmandt,et al. An audio and telephone server for multi-media workstations , 1988, [1988] Proceedings. 2nd IEEE Conference on Computer Workstations.

[9] Barry Arons,et al. The design of audio servers and toolkits for supporting speech in the user interface , 1991 .

[10] Barry Arons. Hyperspeech: navigating in speech-only hypermedia , 1991, HYPERTEXT '91.