Schema-Guided Induction of Monadic Queries

The induction of monadic node selecting queries from partially annotated XML-trees is a key task in Web information extraction. We show how to integrate schema guidance into an RPNI-based learning algorithm, in which monadic queries are represented by pruning node selecting tree transducers. We present experimental results on schema guidance by the DTD of HTML.

[1]  Dino Pedreschi,et al.  Machine Learning: ECML 2004 , 2004, Lecture Notes in Computer Science.

[2]  Craig A. Knoblock,et al.  Wrapper Maintenance: A Machine Learning Approach , 2011, J. Artif. Intell. Res..

[3]  Aidan Finn,et al.  Multi-level Boundary Classification for Information Extraction , 2004, ECML.

[4]  Maurice Bruynooghe,et al.  Learning (k, l)-Contextual Tree Languages for Information Extraction , 2005, ECML.

[5]  Joachim Niehren,et al.  Learning n-Ary Node Selecting Tree Transducers from Completely Annotated Examples , 2006, ICGI.

[6]  Joachim Niehren,et al.  Querying Unranked Trees with Stepwise Tree Automata , 2004, RTA.

[7]  Joachim Niehren,et al.  Interactive learning of node selecting tree transducer , 2006, Machine Learning.

[8]  Aurélien Lemay,et al.  Interactive Learning of Node Selecting Tree Transducers ⋆ , 2010 .

[9]  William W. Cohen,et al.  A flexible learning system for wrapping tables and lists in HTML documents , 2002, WWW.

[10]  M Bruynooghe,et al.  Information Extraction from Web Pages Based on Tree Automata Induction , 2022 .

[11]  Stefan Schaal,et al.  Natural Actor-Critic , 2003, Neurocomputing.

[12]  Amaury Habrard,et al.  A Polynomial Algorithm for the Inference of Context Free Languages , 2008, ICGI.

[13]  Paul A. Viola,et al.  Interactive Information Extraction with Constrained Conditional Random Fields , 2004, AAAI.

[14]  Colin de la Higuera,et al.  Introducing Domain and Typing Bias in Automata Inference , 2004, ICGI.

[15]  Raymondus Kosala Information extraction by tree automata inference , 2003 .

[16]  Rémi Gilleron,et al.  Interactive Tuples Extraction from Semi-Structured Data , 2006, 2006 IEEE/WIC/ACM International Conference on Web Intelligence (WI 2006 Main Conference Proceedings)(WI'06).

[17]  Anne Brüggemann-Klein Regular Expressions into Finite Automata , 1993, Theor. Comput. Sci..

[18]  Georg Gottlob,et al.  Complexity and expressive power of logic programming , 1997, Proceedings of Computational Complexity. Twelfth Annual IEEE Conference.

[19]  Thomas Schwentick,et al.  Inference of concise DTDs from XML data , 2006, VLDB.

[20]  Hubert Comon,et al.  Tree automata techniques and applications , 1997 .

[21]  Joachim Niehren,et al.  Efficient Inclusion Checking for Deterministic Tree Automata and DTDs , 2008, LATA.

[22]  Derick Wood,et al.  One-Unambiguous Regular Languages , 1998, Inf. Comput..

[23]  J. Oncina,et al.  INFERRING REGULAR LANGUAGES IN POLYNOMIAL UPDATED TIME , 1992 .