A multimodal browser for the World-Wide Web

Spoken Language Access to Multimedia (SLAM) is a spoken language extension to the graphical user interface of the World-Wide Web browser Mosaic. SLAM uses the complementary modalities of spoken language and direct manipulation to improve the interface to the vast variety of information available on the Internet. To make the advantages of spoken language systems available to a wider audience, the speech recognition aspects can be performed remotely across a network. This paper describes the issues and architecture of what is believed to be the first spoken-language interface to the World-Wide Web to be easily implemented across platforms.

[1]  Ben Shneiderman,et al.  Direct Manipulation: A Step Beyond Programming Languages , 1983, Computer.

[2]  Andrew S. Patrick,et al.  Conversational hypertext: information access through natural language dialogues with computers , 1989, CHI '89.

[3]  H Hermansky,et al.  Perceptual linear predictive (PLP) analysis of speech. , 1990, The Journal of the Acoustical Society of America.

[4]  Philip R. Cohen The role of natural language in a multimodal interface , 1992, UIST '92.

[5]  Sharon L. Oviatt,et al.  Integration themes in multimodal human-computer interaction , 1994, ICSLP.

[6]  Thomas P. Moran,et al.  The workaday world as a paradigm for CSCW design , 1990, CSCW '90.

[7]  G. Duncan,et al.  Speaking with computers: a multimodal approach , 1993, EUROSPEECH.

[8]  Ronald A. Cole,et al.  City name recognition over the telephone , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[9]  Ronald A. Cole,et al.  Towards automatic collection of the US census , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[10]  Hynek Hermansky,et al.  Recognition of speech in additive and convolutional noise based on RASTA spectral processing , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[11]  Alexander I. Rudnicky,et al.  The design of a spoken language interface , 1990, HLT.

[12]  Ronald A. Cole,et al.  Speaker-independent recognition of spoken English letters , 1990, 1990 IJCNN International Joint Conference on Neural Networks.

[13]  Alexander I. Rudnicky Factors affecting choice of speech over keyboard and mouse in a simple data-retrieval task , 1993, EUROSPEECH.

[14]  Les E. Atlas,et al.  The challenge of spoken language systems: research directions for the nineties , 1995, IEEE Trans. Speech Audio Process..

[15]  Joëlle Coutaz,et al.  A design space for multimodal systems: concurrent processing and data fusion , 1993, INTERCHI.