Automatic wrapper maintenance for semi-structured web sources using results from previous queries

During the last years, significant attention has been paid to the problem of building wrappers for extracting data from semistructured web sources. Nevertheless, since web sources are autonomous, they may experience changes that invalidate the wrappers. In this paper, we present new heuristics and algorithms to address the problem of automatic wrapper maintenance. Our approach is based on collecting query results during wrapper operation and using them later to generate new sets of examples that can be used to induce a new wrapper when the source changes.