论文信息 - Experimentation on Spoken Format of Tables in Auditory User Interfaces

Experimentation on Spoken Format of Tables in Auditory User Interfaces

The acoustic representation of complex visual structures involves both synthesized speech and non-speech audio signals. Though progress in speech synthesis allows the consistent control of an abundance of parameters, like prosody through appropriate mark-up, there is not enough experimentally proven specification input data to drive a Voice Browser for such purposes. This paper reports on the results from a series of psychoacoustic experiments aiming to provide natural speech prosodic specification for the task of vocalizing tables. Blind and sighted listeners were asked to reconstruct simple and complex data tables from naturally spoken descriptions. From the listeners’ feedback it was deducted that consistent prosodic rendering can model the underlying semantic structure of tables.

Georgios Kouroupetroglou | Dimitris Spiliotopoulos | Gerasimos Xydas | Vasilios Argyropoulos

[1] Daniela Rosu,et al. Improving the accessibility of aurally rendered HTML tables , 2002, Assets '02.

[2] Gregg C. Vanderheiden,et al. Web content accessibility guidelines 1.0 , 2001, INTR.

[3] Jean-Yves Ramel,et al. Detection, extraction and representation of tables , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[4] Enrico Pontelli,et al. A domain specific language framework for non-visual browsing of complex HTML structures , 2000, Assets '00.

[5] Enrico Pontelli,et al. Navigation of HTML tables, frames, and XML fragments , 2002, Assets '02.

[6] Shona Douglas,et al. Layout & Language: Preliminary experiments in assigning logical structure to table cells , 1997, ANLP.

[7] Georgios Kouroupetroglou,et al. An Experimental Approach in Recognizing Synthesized Auditory Components in a Non-Visual Interaction with Documents , 2005 .

[8] Georgios Kouroupetroglou,et al. Text-to-speech scripting interface for appropriate vocalisation of e-texts , 2001, INTERSPEECH.

[9] Georgios Kouroupetroglou,et al. Modelling Emphatic Events from Non-Speech Aware Documents in Speech Based User Interfaces , 2003 .

[10] Yiu-Kai Ng,et al. An automated approach for retrieving hierarchical data from HTML tables , 1999, CIKM '99.

[11] Jianying Hu,et al. Flexible Web document analysis for delivery to narrow-bandwidth devices , 2001, Proceedings of Sixth International Conference on Document Analysis and Recognition.

[12] Carole A. Goble,et al. Rendering tables in audio: the interaction of structure and reading styles , 2003, Assets '04.

[13] Mari Ostendorf,et al. TOBI: a standard for labeling English prosody , 1992, ICSLP.

[14] T. V. Raman,et al. An Audio View of (LA)TEX Documents , 2001 .