论文信息 - Audio System for Technical Readings

Audio System for Technical Readings

The advent of electronic documents makes information available in more than its visual form--electronic information can now be display-independent. We describe a computing system, A$\sb{\rm S}$T$\sb{\rm E}$R, that audio formats electronic documents to produce audio documents. A$\sb{\rm S}$T$\sb{\rm E}$R can speak both literary texts and highly technical documents (presently in (L$\sp{\rm A}$)T$\sb{\rm E}$X) that contain complex mathematics. Visual communication is characterized by the eye's ability to actively access parts of a two-dimensional display. The reader is active, while the display is passive. This active-passive role is reversed by the temporal nature of oral communication: information flows actively past a passive listener. This prohibits multiple views--it is impossible to first obtain a high-level view and then 'look' at details. These shortcomings become severe when presenting complex mathematics orally. Audio formatting, which renders information structure in a manner attuned to an auditory display, overcomes these problems. A$\sb{\rm S}$T$\sb{\rm E}$R is interactive, and the ability to browse information structure and obtain multiple views enables active listening.

T. V. Raman | T. Raman

[1] James Gettys,et al. AudioFile: A Network-Transparent System for Distributed Audio Applications , 1993, USENIX Summer.

[2] Marc H. Brown,et al. Zeus: a system for algorithm animation and multi-view editing , 1991, Proceedings 1991 IEEE Workshop on Visual Languages.

[3] Florian Cajori. Notations in elementary mathematics , 1928 .

[4] John Hershberger,et al. Color and sound in algorithm animation , 1992, Computer.

[5] Heikki Mannila,et al. A Structured Document Database System , 1990 .

[6] William W. Gaver. Synthesizing auditory icons , 1993, INTERCHI.

[7] Vincent Quint,et al. Combining hypertext and structured documents in Grif , 1993, ECHT '92.

[8] Julia Hirschberg,et al. The intonational Structuring of Discourse , 1986, ACL.

[9] Barry Arons,et al. The design of audio servers and toolkits for supporting speech in the user interface , 1991 .

[10] Leslie Lamport,et al. Latex : A Document Preparation System , 1985 .

[11] Mark Linton,et al. A two-view document editor with user-definable document structure , 1988 .

[12] James Raymond Davis. Back seat driver : voice assisted automobile navigation , 1989 .

[13] Susan R. Hertz. A modular approach to multi-dialect and multi-language speech synthesis using the delta system , 1990, SSW.

[14] R. M. Greenberg,et al. Guidelines for the syntactic design of audio cues in computer interfaces , 1985 .

[15] Vincent Quint,et al. Towards document engineering , 1990 .

[16] Barry Arons,et al. Techniques, Perception, and Applications of Time-Compressed Speech , 2009 .

[17] J.R. Davis,et al. The Back Seat Driver: real time spoken driving instructions , 1989, Conference Record of papers presented at the First Vehicle Navigation and Information Systems Conference (VNIS '89).

[18] J. R. Davis. Discourse Strategies for Conversation in Time , 1990 .

[19] Barry Arons,et al. A Voice and Audio Server for Multimedia Workstations , 1989 .

[20] Florian Cajori. Notations mainly in higher mathematics , 1929 .

[21] Lawrence A. Chang. Handbook for Spoken Mathematics: (Larry's Speakeasy). , 1983 .

[22] Barry Arons,et al. Tools for building asynchronous servers to support speech and audio applications , 1992, UIST '92.

[23] Barry Arons,et al. Interactively skimming recorded speech , 1994 .

[24] T. V. Raman,et al. Congrats: a system for converting graphics to sound , 1992, Proceedings of the Johns Hopkins National Search for Computing Applications to Assist Persons with Disabilities.

[25] Thomas Reps,et al. The synthesizer generator , 1984 .

[26] Elizabeth M. Wenzel,et al. Localization with non-individualized virtual acoustic display cues , 1991, CHI.

[27] D H Klatt,et al. Review of text-to-speech conversion for English. , 1987, The Journal of the Acoustical Society of America.

[28] Susan R. Hertz,et al. Streams, phones and transitions: toward a new phonological and phonetic model of formant timing , 1991 .

[29] Elizabeth M. Wenzel,et al. Real-time digital of virtual acoustic environments , 1990, I3D '90.

[30] Meera Blattner,et al. Earcons and Icons: Their Structure and Common Design Principles , 1989, Hum. Comput. Interact..

[31] Donald E. Knuth,et al. TeX: The Program , 1986 .

[32] P. David Stotts,et al. Hyperdocuments as automata: trace-based browsing property verification , 1992, ECHT '92.

[33] I. H. Öğüş,et al. NATO ASI Series , 1997 .

[34] P. David Stotts,et al. Hierarchy, Composition, Scripting Languages, and Translators for Structured Hypertext , 1990, ECHT.

[35] Patrick Borras,et al. Centaur: the system , 1988, Software Development Environments.

[36] Daniel M. Yellin. Attribute Grammar Inversion and Source-to-source Translation , 1988, Lecture Notes in Computer Science.

[37] Thomas W. Reps,et al. The Synthesizer Generator Reference Manual , 1989, Texts and Monographs in Computer Science.

[38] Lynne A. Price,et al. Evolution of an SGML application generator , 2000, DOCPROCS '88.

[39] Donald E. Knuth,et al. The TeXbook , 1984 .

[40] A. Wilgus,et al. High quality time-scale modification for speech , 1985, ICASSP '85. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[41] Candace L. Sidner,et al. Attention, Intentions, and the Structure of Discourse , 1986, CL.

[42] Guy L. Steele,et al. Common Lisp the Language , 1984 .

[43] M. O'Malley,et al. Recovering parentheses from spoken algebraic expressions , 1973 .

[44] Julia Hirschberg,et al. Using discourse context to guide pitch accent decisions in synthetic speech , 1990, SSW.

[45] Ibm Westlake. AUDIO-ENABLED GRAPHICAL USER INTERFACE FOR THE BLIND OR VISUALLY IMPAIRED , 1992 .

[46] Allen L. Brown,et al. A Logic Grammar Foundation for Document Representation and Document Layout , 1990 .

[47] M. F.,et al. Bibliography , 1985, Experimental Gerontology.

[48] Julia Hirschberg,et al. Predicting Intonational Boundaries Automatically from Text: The ATIS Domain , 1991, HLT.

[49] Julia Hirschberg. Using text analysis to predict intonational boundaries , 1991, EUROSPEECH.

[50] David M. Levy,et al. Topics in document research , 2000, DOCPROCS.

[51] L. Streeter. Acoustic determinants of phrase boundary perception. , 1978, The Journal of the Acoustical Society of America.

[52] Julia Hirschberg,et al. Intonation and the Intentional Structure of Discourse , 1987, IJCAI.