Interactive web-wrapper construction for extracting relational information from web documents
暂无分享,去创建一个
In this paper, we propose a new user interface to interactively specify Web wrappers to extract relational information from Web documents. In this study, we focused on improving user's trial-and-error repetitions for constructing a wrapper. Our approach is a combination of a light-weight wrapper construction method and the dynamic previewing interface which quickly previews how generated wrapper works. We adopted a simple algorithm which can construct a Web wrapper from given extraction examples in less than 100 milliseconds. By using the algorithm, our system dynamically generates a new wrapper from a stream of user's mouse events for specifying extraction examples, and immediately updates a preview result that shows how the generated wrapper extracts HTML nodes from a source Web document. Through this animated display, a user can make a lot of wrapper construction trials with various different combinations of extraction examples by only moving a mouse on the Web document, and reach a good set of examples to obtain an intended wrapper in a short time.
[1] Dan Gusfield,et al. Algorithms on Strings, Trees, and Sequences - Computer Science and Computational Biology , 1997 .
[2] Berthier A. Ribeiro-Neto,et al. A brief survey of web data extraction tools , 2002, SGMD.
[3] Dan Gusfield,et al. Algorithms on Strings, Trees, and Sequences - Computer Science and Computational Biology , 1997 .