Interactive web-wrapper construction for extracting relational information from web documents

In this paper, we propose a new user interface to interactively specify Web wrappers to extract relational information from Web documents. In this study, we focused on improving user's trial-and-error repetitions for constructing a wrapper. Our approach is a combination of a light-weight wrapper construction method and the dynamic previewing interface which quickly previews how generated wrapper works. We adopted a simple algorithm which can construct a Web wrapper from given extraction examples in less than 100 milliseconds. By using the algorithm, our system dynamically generates a new wrapper from a stream of user's mouse events for specifying extraction examples, and immediately updates a preview result that shows how the generated wrapper extracts HTML nodes from a source Web document. Through this animated display, a user can make a lot of wrapper construction trials with various different combinations of extraction examples by only moving a mouse on the Web document, and reach a good set of examples to obtain an intended wrapper in a short time.