Automatic repairing of web wrappers
暂无分享,去创建一个
We study the problem of automatic repairing of wrappers for Web information providers. Majority of Web wrappers use "hooks'' or "landmarks'' to find and extract relevant information from Web pages and such wrappers often become inoperable when the page structure is changed. The solution we propose in this paper extends conventional forward wrappers with alternative classifiers built using content features of extracted information and wrappers processing pages backward. We report some preliminary results of the information extraction recovery and wrapper repairing for a set of real Web provider changes.
[1] Boris Chidlovskii,et al. Wrapping Web Information Providers by Transducer Induction , 2001, ECML.
[2] Craig A. Knoblock,et al. A hierarchical approach to wrapper induction , 1999, AGENTS '99.
[3] Maarten de Rijke,et al. Wrapper Generation via Grammar Induction , 2000, ECML.
[4] Nicholas Kushmerick,et al. Wrapper induction: Efficiency and expressiveness , 2000, Artif. Intell..