Wrapper Generation via Grammar Induction

To facilitate effective search on the World Wide Web, meta search engines have been developed which do not search the Web themselves, but use available search engines to find the required information. By means of wrappers, meta search engines retrieve information from the pages returned by search engines. We present an approach to automatically create such wrappers by means of an incremental grammar induction algorithm. The algorithm uses an adaptation of the string edit distance. Our method performs well; it is quick, can be used for several types of result pages and requires a minimal amount of user interaction.

[1]  MiningChun-Nan Hsu Finite-state Transducers for Semi-structured Text Mining , 1999 .

[2]  Chun-Nan Hsu,et al.  Generating Finite-State Transducers for Semi-Structured Data Extraction from the Web , 1998, Inf. Syst..

[3]  Craig A. Knoblock,et al.  Wrapper generation for semi-structured Internet sources , 1997, SGMD.

[4]  Boris Chidlovskii,et al.  Towards Sophisticated Wrapping of Web-based information Repositories , 1997, RIAO.

[5]  Alfred V. Aho,et al.  Algorithms for Finding Patterns in Strings , 1991, Handbook of Theoretical Computer Science, Volume A: Algorithms and Complexity.

[6]  Jennifer Widom,et al.  Integrating and Accessing Heterogeneous Information Sources in TSIMMIS , 1994 .

[7]  Yasubumi Sakakibara,et al.  Recent Advances of Grammatical Inference , 1997, Theor. Comput. Sci..

[8]  Hector Garcia-Molina,et al.  Extracting Semistructured Information from the Web. , 1997 .

[9]  Nicholas Kushmerick,et al.  Wrapper Induction for Information Extraction , 1997, IJCAI.

[10]  Guijun Wang,et al.  ProFusion*: Intelligent Fusion from Multiple, Distributed Search Engines , 1996, J. Univers. Comput. Sci..

[11]  Stephen Soderland,et al.  Learning to Extract Text-Based Information from the World Wide Web , 1997, KDD.

[12]  Jean-Marc Andreoli,et al.  The Constraint-Based Knowledge Broker system , 1997, Proceedings 13th International Conference on Data Engineering.

[13]  Alberto O. Mendelzon,et al.  Database techniques for the World-Wide Web: a survey , 1998, SGMD.