On Relational Learning for Information Extraction

The extraction and integration of data from multiples sources are required in current companies which manage their business process by heterogeneous collaborating applications. However, integrating web applications is an arduous task because they are intended for human consumption and they do not provide APIs to access to their data automatically.Web Information extractors are used for this purpose but, they mostly provide ad-hoc highly domain dependent solutions. In this paper we aim at devising Information Extractors with a FOIL based core algorithm. It is a widely used first order rule learning algorithm since their rules are substantially more expressive and allow to learn complex concepts that cannot be represented in the attribute-value format. Furthermore, we focus on integrating other scoring functions to check if we can improve the rule search guide speeding up the learning process in order to make FOIL tractable in real-world domains such as Web sources.

[1]  Geoffrey E. Hinton,et al.  Learning distributed representations of concepts. , 1989 .

[2]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[3]  Ryszard S. Michalski,et al.  Pattern Recognition as Rule-Guided Inductive Inference , 1980, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Patrick Henry Winston,et al.  The psychology of computer vision , 1976, Pattern Recognit..

[5]  R. Mike Cameron-Jones,et al.  FOIL: A Midterm Report , 1993, ECML.

[6]  Ivan Bratko,et al.  Prolog Programming for Artificial Intelligence , 1986 .

[7]  Jorma Rissanen,et al.  Universal coding, information, prediction, and estimation , 1984, IEEE Trans. Inf. Theory.

[8]  Peter A. Flach,et al.  Rule Evaluation Measures: A Unifying View , 1999, ILP.

[9]  Stephen Muggleton,et al.  Inverse entailment and progol , 1995, New Generation Computing.

[10]  Dayne Freitag,et al.  Information Extraction from HTML: Application of a General Machine Learning Approach , 1998, AAAI/IAAI.

[11]  Luc De Raedt,et al.  nFOIL: Integrating Naïve Bayes and FOIL , 2005, AAAI.

[12]  R. Mike Cameron-Jones,et al.  Induction of logic programs: FOIL and related systems , 1995, New Generation Computing.

[13]  Johannes Fürnkranz,et al.  FOSSIL: A Robust Relational Learner , 1994, ECML.

[14]  Patrick Henry Winston,et al.  Learning structural descriptions from examples , 1970 .

[15]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques with Java implementations , 2002, SGMD.

[16]  M. Pazzani,et al.  The Utility of Knowledge in Inductive Learning , 1992, Machine Learning.

[17]  Jaideep Srivastava,et al.  Selecting the right objective measure for association analysis , 2004, Inf. Syst..

[18]  Saso Dzeroski,et al.  Inductive Logic Programming: Techniques and Applications , 1993 .