“Usable Accessibility” to the Web for blind Users

Websites were originally conceived for a text -based interface, but very early (from MOSAIC on) they became the epitome of graphic interface applications. A Web page is inherently complex since it simultaneously conveys pieces of content and relationships among them, “links” to other pages, etc. If the core of the communication lies in the content, a lot of additional but often fundamental semantics comes from visual features: i.e. layout, colors, fonts, spatial relationships, positioning within the page, etc. In addition, in modern Web applications, a large portion of the content is visual (e.g. pictures, graphics), or based on visual perception (e.g. tables, diagrams, etc.). Accessing the Web, for users with disabilities, can be difficult or even very difficult in many senses. In this paper we address the issue of accessibility for a specific type of disability, i.e. blindness. The most useful technology devised for blind users is based upon “ screenreaders”, i.e. software tools that “read” pages aloud. The W3C consortium, through the WAI initiative, has taken the step of providing guidelines, in order to tell developers what they should (or should not) do in order to build “readable” pages. The main thesis of this paper is that the W3C guidelines only guarantee “technical readability”, i.e. the very fact that screenreaders can work; they do not ensure at all the fact that the a Website is “accessible” by blind users, in the sense that blind users can effectively access it. For this reason we advocate “usable accessibility”, ensuring an effective user experience, as apposed to “technical accessibility”, that is the main concern of W3C guidelines. In the paper we present some empirical solutions, toward usable accessibility, that we have devised for a specific site (www.munchundberlin.org ) and also a more long term approach (WED – WEb as Dialog), based upon linguistic research and on the assumption that a Web experience can be treated as a kind of dialog between a user and a machine, and therefore compared (in terms of quality and effectiveness) to a dialogue between the same user and another human being (the curator of the exhibition, for example). This research was partially funded by the Swiss National Fund (contract FNRS 105211-102061/1) and by Culture2000 (project HELP), a research program of the European Commission. 1. HOW BLIND PEOPLE ACCESS THE INTERNET Whilst character-based interfaces offered blind people the extraordinary possibility to make use of their skills in using keyboards and interacting with software tools, graphic interfaces, implying complex pages’ layouts, many visual features and above all the use of the mouse have made their use of the many valuable resources offered by the Web a difficult and cumbersome task. Developing separate Websites specially dedicated to this category of users is definitely not the right solution: first of all, not all the institutions would be willing to pay double costs to develop and also to keep updated two different Websites; a check of the multilingual versions of many Websites clearly demonstrates that usually the main Website is updated whilst its “foreign clones” are left behind, in terms of graphic, content, services, etc. Moreover, blind users themselves refuse being “ghettoized”, rather claiming that a better design would enhance the efficiency and satisfaction of the Web experience for any kind of user (Theofanos & Redish, 2003). Visually impaired people currently access the Web by using screenreaders, that is, software tools capable of interpreting the HTML code and reading it aloud (with a synthesized voice); interaction is allowed by the use of Braille keyboards. Screenreaders’ worth is clear; nonetheless, their limits too start being recognized and discussed in literature (see again Theofanos & Redish , 2003). We shall recollect them here in short (Di Blas et alii, 2004): • They read everything, including elements of HTML that are useful for visualization only (and do not convey relevant meaning to the listener). • They have (by default, at least) a simplistic listening strategy, “top-to-bottom/left-toright”, making it difficult and boring to wait for the relevant piece of information. The reader is invited to read aloud a page of a daily newspaper adopting the same strategy and measuring how long it takes until something relevant is read. • They fail to convey the overall organization of the page, with the relative priorities of the different parts. • They interleave the reading of content with the reading of links, with a total confusion for the listener. The listener can get the links’ list (in alphabetical order), without the content, but s/he can’t get the content without the links! In addition, even the list in alphabetical order is not effective; what if many links begin with the same word? Or if they’re in an interrogative form, for example all beginning with “where can I find...”? Again, this means time and patience in waiting for the links’ meaning to clarify, or wrong and time-consuming moves in the site (Theofanos & Redish, 2003). • The selection mechanisms of the links are difficult and cumbersome. While in theory it is possible to “confirm” the selection while “listening” to a link, in practice, due to synchronization problems (of the audio with the current position on the page) it almost never works. • Pages’ layout and the “graphic’s semantics” (that is, fonts’ size and color, position on the page) are completely lost: the metallic voice of the screenreader will read one by one all the pieces of information of the page with the same emphasis and tone (the landmarks, the main content, the service links...), as if they all shared the same degree of importance. The point is that screenreaders are... “screen-readers”, that is, they basically read what appears on the screen, with a “book-reading” strategy, as if it were the most plausible equivalent to the “at a glance” comprehension of a sighted user. As we will argue later, the key to the solution is to separate the visual from the audio experience: not all that is written or visualized must be read, not all that is read by the screenreader must be visualized on the screen. Some of the problems of the screenreader are “technical”, in the sense they can be (almost) mechanically checked, while some other problems are more “conceptual”, involving design techniques and usability issues. In the next section we shall discuss, in detail, the last version (version 2.0, in preparation) of the W3C guidelines, and we will argue that they correctly address technical accessibility issues, but they are vague (if not wrong) on design or usability issues. 2. THE W3C GUIDELINES: A CRITICAL OVERVIEW The W3C consortium made public a first set of guidelines in May 1999. The second version of these guidelines is currently under preparation (www.w3.org/TR/2003/WD-WCAG2020030624). It consists of 4 major guidelines prescribing that an application should be perceivable, operable, understandable and robust. For each of the four guidelines, checkpoints (18 in total) are defined. For each checkpoint (that are considered normative) definitions, benefits and examples (non normative) are provided. Checkpoints are classified either as “core” or “extended”: to conform to WCAG 2.0, the Required Success Criteria of Core Checkpoints must be satisfied; the “extended” ones are additional checkpoints that may be reported in addition to Core conformance. Let us comment in details the guidelines, as defined by the W3C: we should remind readers that while W3C address all kinds of disabilities, we will comment them taking into consideration blind users only. Guideline 1: PERCEIVABLE. Make Content Perceivable by Any User 1.1 [CORE] All non-text content that can be expressed in words has a text equivalent of the function or information that the non-text content was intended to convey. This is a concern about content: the idea is that graphic and visual content should have a text equivalent. Still, what equivalence means is ve ry difficult to define (see figure 1): which words are equivalent to a painting, an image or a map? Should the text convey the look, the semantics, the emotion, or what else? It is obvious that mechanically satisfying the guideline will not ensure “real” accessibility. Figure 1: from the Museum of Modern Art Website www.moma.org is it a text equivalent to the picture’s meaning? 1.2 [CORE] Synchronized media equivalents are provided for time-dependent presentations. Time dependent presentations, with audio synchronized to changing images, for example, are clearly a major problem for blind users. 1.3 [CORE] Both [information/substance] and structure are separable from presentation. This is an important guideline, the potent ial meaning of which is much deeper than the W3C guidelines seem to imply. We should remind the reader that the key problem lies with HTML where presentation is intermingled with content. In addition, the guidelines focus on presentation details (which are important) and substantially neglect the problem of presentation strategy (which is even more important than details). Furthermore they overlook the fact that for “reading aloud” a page a presentation strategy is necessary: an “oral strategy” very different from the one based on visualization (as it is the one commonly used for Web pages). 1.4 [CORE] All characters and words in the content can be unambiguously decoded. This a technical requirement, necessary and, in a sense, obvious. 1.5 [EXTENDED] Structure has been made perceivable to more people through presentation(s), positioning, and labels. This is a very ambiguous, and in a sense, incorrect guideline. It is (practically) impossible and (above all) useless to attempt to describe with words the “look” of a Web page. The reader may try this simple experiment: try to read the page of a daily newspaper to someone else. Very likely the reader will try to read aloud the semantics (