Refining Aggregate Conditions in Relational Learning

In relational learning, predictions for an individual are based not only on its own properties but also on the properties of a set of related individuals. Many systems use aggregates to summarize this set. Features thus introduced compare the result of an aggregate function to a threshold. We consider the case where the set to be aggregated is generated by a complex query and present a framework for refining such complex aggregate conditions along three dimensions: the aggregate function, the query used to generate the set, and the threshold value. The proposed aggregate refinement operator allows a more efficient search through the hypothesis space and thus can be beneficial for many relational learners that use aggregates. As an example application, we have implemented the refinement operator in a relational decision tree induction system. Experimental results show a significant efficiency gain in comparison with the use of a less advanced refinement operator.

[1]  Jan Komorowski,et al.  Principles of Data Mining and Knowledge Discovery , 2001, Lecture Notes in Computer Science.

[2]  Saso Dzeroski,et al.  First order random forests: Learning relational classifiers with complex aggregates , 2006, Machine Learning.

[3]  Arno J. Knobbe,et al.  Propositionalisation and Aggregates , 2001, PKDD.

[4]  Jennifer Neville,et al.  Learning relational probability trees , 2003, KDD '03.

[5]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[6]  Arno J. Knobbe,et al.  Involving Aggregate Functions in Multi-relational Search , 2002, PKDD.

[7]  Daphne Koller,et al.  Probabilistic Relational Models , 1999, ILP.

[8]  Ashwin Srinivasan,et al.  An Assessment of ILP-Assisted Models for Toxicology and the PTE-3 Experiment , 1999, ILP.

[9]  Gordon Plotkin,et al.  A Note on Inductive Generalization , 2008 .

[10]  Foster J. Provost,et al.  Aggregation-based feature invention and relational concept classes , 2003, KDD '03.

[11]  Laks V. S. Lakshmanan,et al.  Exploratory mining and pruning optimizations of constrained associations rules , 1998, SIGMOD '98.

[12]  Hendrik Blockeel,et al.  Top-Down Induction of First Order Logical Decision Trees , 1998, AI Commun..

[13]  Stefan Wrobel,et al.  Transformation-Based Learning Using Multirelational Aggregation , 2001, ILP.

[14]  Bart Demoen,et al.  Improving the Efficiency of Inductive Logic Programming Through the Use of Query Packs , 2011, J. Artif. Intell. Res..

[15]  Saso Dzeroski,et al.  Diterpene Structure Elucidation from 13CNMR Spectra with Inductive Logic Programming , 1998, Appl. Artif. Intell..

[16]  Hendrik Blockeel,et al.  Classifying Relational Data with Neural Networks , 2005, ILP.

[17]  P. Berka ECML/PKDD 2002 discovery challenge, download data about hepatitis , 2002 .