Speech recognition in the human-computer interface

Researchers have studied human speech interaction with computers for many years. Much of the focus in this area has been on creating better technical speech recognition (SR) systems, and almost all of the testing has centered on accuracy and productivity gains. However, there has been little study of other issues, such as user acceptance. This paper reports the results of an experiment investigating word generation rates, word error rates, and user acceptance of a speech recognition program as compared to typing. Although the subjects made more errors when using the speech recognition software, they were able to generate more than twice as much text in the same amount of time. However, this relative efficiency was not enough to overcome the inaccuracy and annoyance in fixing so many errors.

[1]  Alwyn Lewis,et al.  Speech Technology for Telecommunications , 1998 .

[2]  William E. Cooper,et al.  Cognitive Aspects of Skilled Typewriting , 2011, Springer New York.

[3]  Jong Kyoung Kim,et al.  Speech recognition , 1983, 1983 IEEE International Solid-State Circuits Conference. Digest of Technical Papers.

[4]  Robert D. Rodman,et al.  Computer Speech Technology , 1999 .

[5]  Jonathan G. Fiscus,et al.  Measurements in support of research accomplishments , 2000, CACM.

[6]  Kieran Mathieson,et al.  Beyond the interface: Ease of use and task/technology fit , 1998, Inf. Manag..

[7]  Zhipeng Zhang,et al.  Japanese broadcast news transcription and information extraction , 2000, CACM.

[8]  G. Keppel,et al.  Design and Analysis: A Researcher's Handbook , 1976 .

[9]  Henry Quastler Three Survey Papers: 1) A Survey of Work Done by the Bio-Systems Group of the Control Systems Laboratory; 2) Studies of Human Channel Capacity; 3) The Informational Limitations of Decision Making , 1956 .

[10]  Jean-Luc Gauvain,et al.  Transcribing broadcast news for audio and video indexing , 2000, CACM.

[11]  Detmar W. Straub,et al.  The psychological origins of perceived usefulness and ease-of-use , 1999, Inf. Manag..

[12]  William Willis Speech Recognition: Instead of Typing and Clicking, Talk and Command , 1998 .

[13]  Kuldip K. Paliwal,et al.  Automatic Speech and Speaker Recognition: Advanced Topics , 1999 .

[14]  Howard D. Wactlar,et al.  Complementary video and audio analysis for broadcast news archives , 2000, CACM.

[15]  Richard C. Anderson,et al.  Conceptual and empirical bases of readability formulas , 1986 .

[16]  Alexander G. Hauptmann,et al.  Learning to Recognize Speech by Watching Television , 1999, IEEE Intell. Syst..

[17]  Thomas G. Devine Listening and Reading. , 1976 .

[18]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[19]  Jay F. Nunamaker,et al.  Whither the Pen-Based Interface? , 1993, J. Manag. Inf. Syst..

[20]  Wayne H. Ward,et al.  Speech recognition , 1997 .

[21]  Christine Heilman,et al.  The Guinness book of world records , 1997 .

[22]  Amit Srivastava,et al.  Integrated technologies for indexing spoken language , 2000, CACM.

[23]  E. Lenneberg Biological Foundations of Language , 1967 .

[24]  C. Osgood,et al.  Hesitation Phenomena in Spontaneous English Speech , 1959 .

[25]  Stanley F. Chen,et al.  Language and Pronunciation Modeling in the CMU 1996 Hub 4 Evaluation , 1999 .

[26]  Kuldip K. Paliwal,et al.  Automatic Speech and Speaker Recognition , 1996 .

[27]  Jacob Cohen Statistical Power Analysis for the Behavioral Sciences , 1969, The SAGE Encyclopedia of Research Design.

[28]  Judith A. Markowitz Using Speech Recognition , 1995 .

[29]  H. Gish,et al.  Text-independent speaker identification , 1994, IEEE Signal Processing Magazine.

[30]  Lindsay Gilmour Speak easy Laszlo Solymar Getting the message: a history , 2000, The Lancet.