Computer vision-based analysis of web page structure for assistive interfaces

My PhD research aims to develop novel solutions to the challenge of identifying web page structure through the visual analysis of web pages as images. The intention is to then combine this back end design with various front end applications in order to provide improved web experiences for users with assistive needs (e.g. assisting visually impaired users by supporting more selective screenreader output, or improving experiences of users with cognitive deficits by allowing reduction of clutter or zooming in on selected web page content). I propose to build a comprehensive computer vision-based system to analyse the semantic structure of web pages based purely on an image of the rendered page, which will produce a rich representation of the page as a tree of regions labelled according to their semantic role. Most research into web page segmentation has focused on the use of the structure of the DOM tree and visual features derived from properties specified in the DOM tree. I argue, however, that the image of the rendered page may be a better representation to use, since it is created by the page designer to convey the structure of the page to the user, while the source code and DOM tree are simply intended to cause the browser's rendering engine to produce the correct appearance, and treat many types of content as black boxes. Additionally, my proposed system uses exactly the information seen by a user regardless of implementation method; this gives advantages in implementation-independence and versatility.