Bottom-Up Learning of Logic Programs for Information Extraction from Hypertext Documents

We present an inductive logic programming bottom-up learning algorithm (BFOIL) for synthesizing logic programs for multi-slot information extraction from hypertext documents. BFOIL learns from positive examples only and uses a logical representation for hypertext documents based on the document object model (DOM). We briefly discuss several BFOIL refinements and show very promising results of our IE system LIPX in comparison to state of the art IE systems.

[1]  Kurt Lautenbach,et al.  Reproducibility of the Empty Marking , 2002, ICATPN.

[2]  Andreas Winter,et al.  Querying as an enabling technology in software reengineering , 1999, Proceedings of the Third European Conference on Software Maintenance and Reengineering (Cat. No. PR00090).

[3]  Arnaud Le Hors,et al.  Document Object Model (DOM) Level 2 Core Specification - Version 1.0 , 2000 .

[4]  Jürgen Dix,et al.  Super logic programs , 2000, TOCL.

[5]  Jürgen Dix,et al.  Relating defeasible and normal logic programming through transformation properties , 2000, Theor. Comput. Sci..

[6]  Jack Minker,et al.  Logic and Data Bases , 1978, Springer US.

[7]  Andreas Winter,et al.  Exchanging Graphs with GXL , 2001, GD.

[8]  Craig A. Knoblock,et al.  A hierarchical approach to wrapper induction , 1999, AGENTS '99.

[9]  Jürgen Ebert,et al.  GraX-an interchange format for reengineering tools , 1999, Sixth Working Conference on Reverse Engineering (Cat. No.PR00303).

[10]  Jens Woch,et al.  Implementation of a Schema-TAG-Parser , 1999 .

[11]  Fabio Ciravegna,et al.  Learning to Tag for Information Extraction from Text , 2000 .

[12]  Dayne Freitag,et al.  Machine Learning for Information Extraction in Informal Domains , 2000, Machine Learning.

[13]  Stephan Philippi,et al.  Modelling a concurrent ray-tracing algorithm using object-oriented Petri-Nets , 2001 .

[14]  R. Lathe Phd by thesis , 1988, Nature.

[15]  Oliver Obst Specifying Rational Agents with Statecharts and Utility Functions , 2001, RoboCup.

[16]  J. Lloyd Foundations of Logic Programming , 1984, Symbolic Computation.

[17]  Oliver Obst,et al.  Qualitative Velocity and Ball Interception , 2002, KI.

[18]  William W. Cohen,et al.  A flexible learning system for wrapping tables and lists in HTML documents , 2002, WWW.

[19]  J. Ebert,et al.  A Generalization of the Hyperspace Approach Using Meta-Models , 2003 .

[20]  Kurt Lautenbach,et al.  Logical Reasoning and Petri Nets , 2003, ICATPN.

[21]  Oliver Obst,et al.  Spatial Agents Implemented in a Logical Expressible Language , 1999, RoboCup.

[22]  Joseph Douglas Horton,et al.  Merge Path Improvements for Minimal Model Hyper Tableaux , 1999, TABLEAUX.

[23]  Sun-Ok Gwon University of Texas at Austin의 연구 현황 , 2002 .

[24]  Margret Groß-Hardt,et al.  Concept based querying of semistructured data , 2002, XSW.

[25]  Norbert Eisinger,et al.  A Confluent Connection Calculus , 2000, Intellectics and Computational Logic.

[26]  J. W. Lloyd,et al.  Foundations of logic programming; (2nd extended ed.) , 1987 .

[27]  Peter Baumgartner,et al.  The Model Evolution Calculus , 2003, CADE.

[28]  Victor W. Marek,et al.  The Logic Programming Paradigm , 1999, Artificial Intelligence.

[29]  Peter Baumgartner,et al.  First-order logic Davis-Putnam-Logemann-Loveland procedure , 2003 .

[30]  Peter Baumgartner,et al.  Abductive Coreference by Model Construction , 1999 .

[31]  Oliver Obst,et al.  Using Model-Based Diagnosis to Build Hypotheses about Spatial Environments: A Response to a Technical Challenge , 2003, RoboCup.

[32]  Oliver Obst,et al.  Simulation League: The Next Generation , 2003, RoboCup.

[33]  Volker Riediger,et al.  Folding: an approach to enable program understanding of preprocessed languages , 2001, Proceedings Eighth Working Conference on Reverse Engineering.

[34]  MiningChun-Nan Hsu Finite-state Transducers for Semi-structured Text Mining , 1999 .

[35]  Ulrich Furbach,et al.  Nonmonotonic Reasoning: Towards Efficient Calculi and Implementations , 2001, Handbook of Automated Reasoning.

[36]  Raymond Reiter On Closed World Data Bases , 1977, Logic and Data Bases.

[37]  Dayne Freitag,et al.  Boosted Wrapper Induction , 2000, AAAI/IAAI.

[38]  M. Cali,et al.  Relational learning techniques for natural language information extraction , 1998 .

[39]  Toshiaki Arai,et al.  Multiagent systems specification by UML statecharts aiming at intelligent manufacturing , 2002, AAMAS '02.

[40]  Nicholas Kushmerick,et al.  Wrapper Induction for Information Extraction , 1997, IJCAI.

[41]  Margret Groß-Hardt,et al.  Processing of Concept Based Queries for XML Data , 2002 .

[42]  Jürgen Ebert,et al.  A Formalization of SOCCA , 1999 .

[43]  Jan Murray Specifying agents with UML in robotic soccer , 2002, AAMAS '02.

[44]  Manfred Rosendahl,et al.  Specification of Symbols and Implementation of Their Constraints in JKogge , 2000 .

[45]  Jürgen Dix,et al.  Transformation-based bottom-up computation of the well-founded model , 1996, Theory and Practice of Logic Programming.

[46]  Luc De Raedt,et al.  Inductive Logic Programming: Theory and Methods , 1994, J. Log. Program..

[47]  Andy Schürr,et al.  GXL: toward a standard exchange format , 2000, Proceedings Seventh Working Conference on Reverse Engineering.

[48]  JOHN F. Young Machine Intelligence , 1971, Nature.

[49]  Frieder Stolzenburg,et al.  Loop-Detection in Hyper-Tableaux by Powerful Model Generation , 1999, J. Univers. Comput. Sci..

[50]  Guillermo R. Simari,et al.  lntroducing generalized specificity in logic programming , 2000 .

[51]  Markus Junker,et al.  Learning for Text Categorization and Information Extraction with ILP , 1999, Learning Language in Logic.

[52]  Andreas Winter,et al.  Towards a Common Query Language for Reverse Engineering , 2002 .

[53]  Peter Baumgartner,et al.  Automated Deduction Techniques for the Management of Personalized Documents , 2003, Annals of Mathematics and Artificial Intelligence.

[54]  J. Ross Quinlan,et al.  Learning logical definitions from relations , 1990, Machine Learning.

[55]  Oliver Obst,et al.  Towards a Logical Approach for Soccer Agents Engineering , 2000, RoboCup.

[56]  Peter Baumgartner,et al.  The Taming of the (X)OR , 2000, Computational Logic.

[57]  Ulrich Furbach,et al.  AI--A multiple book review , 2003, Artif. Intell..

[58]  Peter Baumgartner,et al.  Living Book – Deduction, Slicing, and Interaction , 2004, Journal of Automated Reasoning.