Accessing Documents via Audio: An Extensible Transcoder for HTML to VoiceXML Conversion

Increasing proliferation of hand-held devices as well as the need for delivering information to visually impaired persons have caused the need for transcoding web information into documents that can be delivered as audio. Web information is typically represented as HTML (Hyper Text Mark-up Language) documents. Audio delivery of web documents is done using VoiceXML. Due to this difference in mark-up notation, much of the web is inaccessible via audio. One way to solve this accessibility problem is to automatically transcode HTML documents to VoiceXML. In this paper, we describe such an automatic transcoder that converts HTML into VoiceXML. The transcoder is compositional and is realized in two phases: The parsing phase where the input HTML file is converted to HTML node tree, and the semantic mapping phase where each node in the HTML tree is compositionally mapped to its equivalent VoiceXML node. Our transcoder is extensible in the sense that: (i) it can be upgraded easily by users to accommodate modifications to and extensions of HTML; (ii) it provides means for the user to modify the translation logic while dealing with certain HTML tags. The translator is being publicly distributed.

[1]  Shriram Krishnamurthi,et al.  SXSLT: Manipulation Language for XML , 2003, PADL.

[2]  Bruce Lucas VoiceXML for Web-based distributed conversational applications , 2000, CACM.

[3]  Juliana Freire,et al.  WebViews: accessing personalized web content and services , 2001, WWW '01.

[4]  Michael J. Wynblatt,et al.  Web page caricatures: multimedia summaries for WWW documents , 1998, Proceedings. IEEE International Conference on Multimedia Computing and Systems (Cat. No.98TB100241).

[5]  Elizabeth D. Mynatt Auditory Presentation of Graphical User Interfaces , 1992 .

[6]  Stuart Goose,et al.  Enhancing Web accessibility via the Vox Portal and a Web-hosted dynamic HTMLVoxML converter , 2000, Comput. Networks.

[7]  Frankie James Presenting HTML Structure in Audio: User Satisfaction with Audio Hypertext , 1998 .

[8]  Elizabeth D. Mynatt,et al.  An architecture for transforming graphical interfaces , 1994, UIST '94.

[9]  Dan Benson,et al.  Browsing the world wide web in a non-visual environment , 1997 .

[10]  Peter J. Danielsen The Promise of a Voice-Enabled Web , 2000, Computer.

[11]  T. V. Raman,et al.  Emacspeak—direct speech access , 1996, Assets '96.

[12]  J. D. Hartman,et al.  VoiceXML builder: a workbench for investigating voiced-based applications , 2001, 31st Annual Frontiers in Education Conference. Impact on Engineering and Science Education. Conference Proceedings (Cat. No.01CH37193).

[13]  G. Gupta,et al.  Building the Tower of Babel : Converting XML Documents to VoiceXML for Accessibility , 2000 .