Auditory universal accessibility of data tables using naturally derived prosody specification

Text documents usually embody visually oriented meta-information in the form of complex visual structures, such as tables. The semantics involved in such objects result in poor and ambiguous text-to-speech synthesis. Although most speech synthesis frameworks allow the consistent control of an abundance of parameters, such as prosodic cues, through appropriate markup, there is no actual prosodic specification to speech-enable visual elements. This paper presents a method for the acoustic specification modelling of simple and complex data tables, derived from the human paradigm. A series of psychoacoustic experiments were set up for providing speech properties obtained from prosodic analysis of natural spoken descriptions of data tables. Thirty blind and 30 sighted listeners selected the most prominent natural rendition. The derived prosodic phrase accent and pause break placement vectors were modelled using the ToBI semiotic system to successfully convey semantically important visual information through prosody control. The quality of the information provision of speech-synthesized tables when utilizing the proposed prosody specification was evaluated by first-time listeners. The results show a significant increase (from 14 to 20% depending on the table type) of the user subjective understanding (overall impression, listening effort and acceptance) of the table data semantic structure compared to the traditional linearized speech synthesis of tables. Furthermore, it is proven that successful prosody manipulation can be applied to data tables using generic specification sets for certain table types and browsing techniques, resulting in improved data comprehension.

[1]  Georgios Kouroupetroglou,et al.  An Experimental Approach in Recognizing Synthesized Auditory Components in a Non-Visual Interaction with Documents , 2005 .

[2]  Alfred Kobsa,et al.  User Interfaces for All , 1999 .

[3]  Tran Cao Son,et al.  Towards the creation of accessibility agents for non-visual navigation of the web , 2002, CUU '03.

[4]  Constantine Stephanidis,et al.  User Interfaces for All: New Perspectives into Human- Computer Interaction , 2001 .

[5]  Patrick Brézillon,et al.  Lecture Notes in Artificial Intelligence , 1999 .

[6]  Shona Douglas,et al.  Layout and language: preliminary investigations in recognizing the structure of tables , 1997, Proceedings of the Fourth International Conference on Document Analysis and Recognition.

[7]  Constantine Stephanidis,et al.  Universal Access in HCI , 2001 .

[8]  Gregg C. Vanderheiden,et al.  HTML Techniques for Web Content Accessibility , 2000 .

[9]  Constantine Stephanidis,et al.  Designing for all in the Information Society: Challenges towards universal access in the information age , 1999 .

[10]  Chieko Asakawa,et al.  An interactive method for accessing tables in HTML , 1998, Assets '98.

[11]  Yiu-Kai Ng,et al.  An automated approach for retrieving hierarchical data from HTML tables , 1999, CIKM '99.

[12]  Daniela Rosu,et al.  Improving the accessibility of aurally rendered HTML tables , 2002, Assets '02.

[13]  Alistair D. N. Edwards,et al.  An improved auditory interface for the exploration of lists , 1997, MULTIMEDIA '97.

[14]  Enrico Pontelli,et al.  Intelligent non-visual navigation of complex HTML structures , 2002, Universal Access in the Information Society.

[15]  Mari Ostendorf,et al.  TOBI: a standard for labeling English prosody , 1992, ICSLP.

[16]  T. V. Raman,et al.  An Audio View of (LA)TEX Documents , 2001 .

[17]  Mahesh Viswanathan,et al.  Measuring speech quality for text-to-speech systems: development and assessment of a modified mean opinion score (MOS) scale , 2005, Comput. Speech Lang..

[18]  Jianying Hu,et al.  Flexible Web document analysis for delivery to narrow-bandwidth devices , 2001, Proceedings of Sixth International Conference on Document Analysis and Recognition.

[19]  Georgios Kouroupetroglou,et al.  Diction Based Prosody Modeling in Table-to-Speech Synthesis , 2005, TSD.

[20]  Georgios Kouroupetroglou,et al.  Modelling Emphatic Events from Non-Speech Aware Documents in Speech Based User Interfaces , 2003 .

[21]  Georgios Kouroupetroglou,et al.  Text-to-speech scripting interface for appropriate vocalisation of e-texts , 2001, INTERSPEECH.

[22]  Luís Torgo,et al.  Design of an end-to-end method to extract information from tables , 2006, International Journal of Document Analysis and Recognition (IJDAR).

[23]  David W. Embley,et al.  Table-processing paradigms: a research survey , 2006, International Journal of Document Analysis and Recognition (IJDAR).

[24]  Shona Douglas,et al.  Layout & Language: Preliminary experiments in assigning logical structure to table cells , 1997, ANLP.

[25]  Georgios Kouroupetroglou,et al.  Experimentation on Spoken Format of Tables in Auditory User Interfaces , 2005 .

[26]  Matthew Hurst Towards a theory of tables , 2006, International Journal of Document Analysis and Recognition (IJDAR).

[27]  Carole A. Goble,et al.  Rendering tables in audio: the interaction of structure and reading styles , 2003, Assets '04.

[28]  Gregg C. Vanderheiden,et al.  Web content accessibility guidelines 1.0 , 2001, INTR.

[29]  Gregg C. Vanderheiden,et al.  Core Techniques for Web Content Accessibility Guidelines 1.0 , 2000 .

[30]  Enrico Pontelli,et al.  A domain specific language framework for non-visual browsing of complex HTML structures , 2000, Assets '00.

[31]  Jean-Yves Ramel,et al.  Detection, extraction and representation of tables , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[32]  Enrico Pontelli,et al.  Navigation of HTML tables, frames, and XML fragments , 2002, Assets '02.

[33]  Georgios Kouroupetroglou,et al.  Tone-Group F0 selection for modeling focus prominence in small-footprint speech synthesis , 2006, Speech Commun..