Learning (k, l)-Contextual Tree Languages for Information Extraction

This paper introduces a novel method for learning a wrapper for extraction of text nodes from web pages based upon (k,l)-contextual tree languages. It also introduces a method to learn good values of k and l based on a few positive and negative examples. Finally, it describes how the algorithm can be integrated in a tool for information extraction.

[1]  Timo Knuutila Inference of k -testable Tree Languages , 1993 .

[2]  Jan Komorowski,et al.  Principles of Data Mining and Knowledge Discovery , 2001, Lecture Notes in Computer Science.

[3]  Maurice Bruynooghe,et al.  Parameterless Information Extraction Using (k,l)-Contextual Tree Languages , 2005 .

[4]  Dana Angluin,et al.  Inference of Reversible Languages , 1982, JACM.

[5]  Joachim Niehren,et al.  Learning Node Selecting Tree Transducer from Completely Annotated Examples , 2004, ICGI.

[6]  E. Mark Gold,et al.  Language Identification in the Limit , 1967, Inf. Control..

[7]  Stephen Muggleton,et al.  Inductive acquisition of expert knowledge , 1986 .

[8]  Robert McNaughton,et al.  Algebraic decision procedures for local testability , 1974, Mathematical systems theory.

[9]  Juan Ramón Rico-Juan,et al.  Probabilistic k-Testable Tree Languages , 2000, ICGI.

[10]  Craig A. Knoblock,et al.  Hierarchical Wrapper Induction for Semistructured Information Sources , 2004, Autonomous Agents and Multi-Agent Systems.

[11]  Andrew McCallum,et al.  Information Extraction with HMMs and Shrinkage , 1999 .

[12]  Nicholas Kushmerick,et al.  Wrapper Induction for Information Extraction , 1997, IJCAI.

[13]  Maurice Bruynooghe,et al.  Information Extraction from Web Documents Based on Local Unranked Tree Automaton Inference , 2003, IJCAI.

[14]  Dayne Freitag,et al.  Boosted Wrapper Induction , 2000, AAAI/IAAI.

[15]  Pedro García Learning k-Testable tree sets from positive data* , 2003 .

[16]  D. Angluin Queries and Concept Learning , 1988 .

[17]  Maurice Bruynooghe,et al.  Information Extraction in Structured Documents Using Tree Automata Induction , 2002, PKDD.

[18]  Craig A. Knoblock,et al.  A hierarchical approach to wrapper induction , 1999, AGENTS '99.

[19]  Maarten de Rijke,et al.  Wrapper Generation via Grammar Induction , 2000, ECML.

[20]  Enric Plaza,et al.  Machine Learning: ECML 2000 , 2003, Lecture Notes in Computer Science.

[21]  Stephen Soderland,et al.  Learning Information Extraction Rules for Semi-Structured and Free Text , 1999, Machine Learning.

[22]  Enrique Vidal,et al.  Inference of k-Testable Languages in the Strict Sense and Application to Syntactic Pattern Recognition , 1990, IEEE Trans. Pattern Anal. Mach. Intell..

[23]  Craig A. Knoblock,et al.  Active Learning with Strong and Weak Views: A Case Study on Wrapper Induction , 2003, IJCAI.