WAVE: An Incremental Algorithm for Information Extraction

This paper describes WAVE, a fully automatic, incremental induction algorithm for learning information extraction rules. Unlike traditionM batch learners, WAVE learns from a stream of training instances, not a set. WAVE overcomes the inherent problems of incremental operation by maintaining a generalization hierarchy of rules. Use of a hierarchy allows similar rules to be found efficiently, provides a natural bound on generalization, enables recall/precision trade-offs without retraining, and speeds extraction since all rules need not be applied to an instance. Finally, because the reliability of rule predictions are continually updated throughout storage, the hierarchy can be used for extraction at any time. Experiments show that WAVE performs as well as CRYSTAL, a related batch algorithm, in two very different extraction domains. WAVE is significantly faster in a simulated incremental application setting.