Audio System for Technical Readings

The advent of electronic documents makes information available in more than its visual form--electronic information can now be display-independent. We describe a computing system, A$\sb{\rm S}$T$\sb{\rm E}$R, that audio formats electronic documents to produce audio documents. A$\sb{\rm S}$T$\sb{\rm E}$R can speak both literary texts and highly technical documents (presently in (L$\sp{\rm A}$)T$\sb{\rm E}$X) that contain complex mathematics. Visual communication is characterized by the eye's ability to actively access parts of a two-dimensional display. The reader is active, while the display is passive. This active-passive role is reversed by the temporal nature of oral communication: information flows actively past a passive listener. This prohibits multiple views--it is impossible to first obtain a high-level view and then 'look' at details. These shortcomings become severe when presenting complex mathematics orally. Audio formatting, which renders information structure in a manner attuned to an auditory display, overcomes these problems. A$\sb{\rm S}$T$\sb{\rm E}$R is interactive, and the ability to browse information structure and obtain multiple views enables active listening.

[1]  James Gettys,et al.  AudioFile: A Network-Transparent System for Distributed Audio Applications , 1993, USENIX Summer.

[2]  Marc H. Brown,et al.  Zeus: a system for algorithm animation and multi-view editing , 1991, Proceedings 1991 IEEE Workshop on Visual Languages.

[3]  Florian Cajori Notations in elementary mathematics , 1928 .

[4]  John Hershberger,et al.  Color and sound in algorithm animation , 1992, Computer.

[5]  Heikki Mannila,et al.  A Structured Document Database System , 1990 .

[6]  William W. Gaver Synthesizing auditory icons , 1993, INTERCHI.

[7]  Vincent Quint,et al.  Combining hypertext and structured documents in Grif , 1993, ECHT '92.

[8]  Julia Hirschberg,et al.  The intonational Structuring of Discourse , 1986, ACL.

[9]  Barry Arons,et al.  The design of audio servers and toolkits for supporting speech in the user interface , 1991 .

[10]  Leslie Lamport,et al.  Latex : A Document Preparation System , 1985 .

[11]  Mark Linton,et al.  A two-view document editor with user-definable document structure , 1988 .

[12]  James Raymond Davis Back seat driver : voice assisted automobile navigation , 1989 .

[13]  Susan R. Hertz A modular approach to multi-dialect and multi-language speech synthesis using the delta system , 1990, SSW.

[14]  R. M. Greenberg,et al.  Guidelines for the syntactic design of audio cues in computer interfaces , 1985 .

[15]  Vincent Quint,et al.  Towards document engineering , 1990 .

[16]  Barry Arons,et al.  Techniques, Perception, and Applications of Time-Compressed Speech , 2009 .

[17]  J.R. Davis,et al.  The Back Seat Driver: real time spoken driving instructions , 1989, Conference Record of papers presented at the First Vehicle Navigation and Information Systems Conference (VNIS '89).

[18]  J. R. Davis Discourse Strategies for Conversation in Time , 1990 .

[19]  Barry Arons,et al.  A Voice and Audio Server for Multimedia Workstations , 1989 .

[20]  Florian Cajori Notations mainly in higher mathematics , 1929 .

[21]  Lawrence A. Chang Handbook for Spoken Mathematics: (Larry's Speakeasy). , 1983 .

[22]  Barry Arons,et al.  Tools for building asynchronous servers to support speech and audio applications , 1992, UIST '92.

[23]  Barry Arons,et al.  Interactively skimming recorded speech , 1994 .

[24]  T. V. Raman,et al.  Congrats: a system for converting graphics to sound , 1992, Proceedings of the Johns Hopkins National Search for Computing Applications to Assist Persons with Disabilities.

[25]  Thomas Reps,et al.  The synthesizer generator , 1984 .

[26]  Elizabeth M. Wenzel,et al.  Localization with non-individualized virtual acoustic display cues , 1991, CHI.

[27]  D H Klatt,et al.  Review of text-to-speech conversion for English. , 1987, The Journal of the Acoustical Society of America.

[28]  Susan R. Hertz,et al.  Streams, phones and transitions: toward a new phonological and phonetic model of formant timing , 1991 .

[29]  Elizabeth M. Wenzel,et al.  Real-time digital of virtual acoustic environments , 1990, I3D '90.

[30]  Meera Blattner,et al.  Earcons and Icons: Their Structure and Common Design Principles , 1989, Hum. Comput. Interact..

[31]  Donald E. Knuth,et al.  TeX: The Program , 1986 .

[32]  P. David Stotts,et al.  Hyperdocuments as automata: trace-based browsing property verification , 1992, ECHT '92.

[33]  I. H. Öğüş,et al.  NATO ASI Series , 1997 .

[34]  P. David Stotts,et al.  Hierarchy, Composition, Scripting Languages, and Translators for Structured Hypertext , 1990, ECHT.

[35]  Patrick Borras,et al.  Centaur: the system , 1988, Software Development Environments.

[36]  Daniel M. Yellin Attribute Grammar Inversion and Source-to-source Translation , 1988, Lecture Notes in Computer Science.

[37]  Thomas W. Reps,et al.  The Synthesizer Generator Reference Manual , 1989, Texts and Monographs in Computer Science.

[38]  Lynne A. Price,et al.  Evolution of an SGML application generator , 2000, DOCPROCS '88.

[39]  Donald E. Knuth,et al.  The TeXbook , 1984 .

[40]  A. Wilgus,et al.  High quality time-scale modification for speech , 1985, ICASSP '85. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[41]  Candace L. Sidner,et al.  Attention, Intentions, and the Structure of Discourse , 1986, CL.

[42]  Guy L. Steele,et al.  Common Lisp the Language , 1984 .

[43]  M. O'Malley,et al.  Recovering parentheses from spoken algebraic expressions , 1973 .

[44]  Julia Hirschberg,et al.  Using discourse context to guide pitch accent decisions in synthetic speech , 1990, SSW.

[45]  Ibm Westlake AUDIO-ENABLED GRAPHICAL USER INTERFACE FOR THE BLIND OR VISUALLY IMPAIRED , 1992 .

[46]  Allen L. Brown,et al.  A Logic Grammar Foundation for Document Representation and Document Layout , 1990 .

[47]  M. F.,et al.  Bibliography , 1985, Experimental Gerontology.

[48]  Julia Hirschberg,et al.  Predicting Intonational Boundaries Automatically from Text: The ATIS Domain , 1991, HLT.

[49]  Julia Hirschberg Using text analysis to predict intonational boundaries , 1991, EUROSPEECH.

[50]  David M. Levy,et al.  Topics in document research , 2000, DOCPROCS.

[51]  L. Streeter Acoustic determinants of phrase boundary perception. , 1978, The Journal of the Acoustical Society of America.

[52]  Julia Hirschberg,et al.  Intonation and the Intentional Structure of Discourse , 1987, IJCAI.

[53]  Doug Brown,et al.  Lex and Yacc , 1990 .

[54]  P. Smith Santa Fe, New Mexico , 1969 .

[55]  Charles F. Goldfarb,et al.  SGML handbook , 1990 .

[56]  Barry Arons,et al.  VoiceNotes: a speech interface for a hand-held voice notetaker , 1993, INTERCHI.

[57]  Dennis S. Arnon,et al.  On the Logical Structure of Mathematical Notation , 1991 .

[58]  Eric A. Bier,et al.  Documents as user interfaces , 1991, CHI '91.

[59]  Elizabeth D. Mynatt,et al.  Mapping GUIs to auditory interfaces , 1992, UIST '92.

[60]  Alan C. Shaw,et al.  The structure of abstract document objects , 1984, COCS '84.

[61]  Thomas Reps,et al.  The Synthesizer Generator: A System for Constructing Language-Based Editors , 1988 .

[62]  I. Lehiste,et al.  Role of duration in disambiguating syntactically ambiguous sentences , 1975 .

[63]  P. David Stotts,et al.  Adding browsing semantics to the hypertext model , 2000, DOCPROCS.

[64]  Paul Resnick HyperVoice-groupware by telephone , 1992 .

[65]  Barry Arons Hyperspeech: navigating in speech-only hypermedia , 1991, HYPERTEXT '91.

[66]  Meera M. Blattner,et al.  Listening to the turbulence: an example of scientific audiolization , 1990 .

[67]  Roy Rada,et al.  Hypertext and electronic publishing , 1992 .

[68]  Simon Holland,et al.  Multimedia Interface Design in Education , 1992, NATO ASI Series.

[69]  P. David Stotts,et al.  Programmable browsing semantics in Trellis , 1989, Hypertext.

[70]  T. V. Raman,et al.  An Audio View of (L A )TE XD ocuments , 1992 .

[71]  William Buxton,et al.  The use of non-speech audio at the interface , 1988, CHI 1988.

[72]  Barry Arons,et al.  A Review of The Cocktail Party Effect , 1992 .

[73]  Meera M. Blattner,et al.  Communicating and Learning Through Non-speech Audio , 1992 .

[74]  Alan R. Katz Issues in defining an equations representation standard , 1987, RFC.

[75]  Julia Hirschberg,et al.  Assigning Intonational Features in Synthesized Spoken Directions , 1988, ACL.