Mining Model Trees: A Multi-relational Approach

In many data mining tools that support regression tasks, training data are stored in a single table containing both the target field (dependent variable) and the attributes (independent variables). Generally, only intra-tuple relationships between the attributes and the target field are found, while inter-tuple relationships are not considered and (inter-table) relationships between several tuples of distinct tables are not even explorable. Disregarding inter-table relationships can be a severe limitation in many real-word applications that involve the prediction of numerical values from data that are naturally organized in a relational model involving several tables (multi-relational model). In this paper, we present a new data mining algorithm, named Mr-SMOTI, which induces model trees from a multi-relational model. A model tree is a tree-structured prediction model whose leaves are associated with multiple linear regression models. The particular feature of Mr-SMOTI is that internal nodes of the induced model tree can be of two types: regression nodes, which add a variable to some multiple linear models according to a stepwise strategy, and split nodes, which perform tests on attributes or the join condition and eventually partition the training set. The induced model tree is a multi-relational pattern that can be represented by means of selection graphs, which can be translated into SQL, or equivalently into first order logic expressions.

[1]  Jorma Rissanen,et al.  SLIQ: A Fast Scalable Classifier for Data Mining , 1996, EDBT.

[2]  Sholom M. Weiss,et al.  Predictive data mining - a practical guide , 1997 .

[3]  Ashwin Srinivasan,et al.  Biochemical Knowledge Discovery Using Inductive Logic Programming , 1998, Discovery Science.

[4]  Luís Torgo,et al.  Functional Models for Regression Tree Leaves , 1997, ICML.

[5]  Arno J. Knobbe,et al.  Propositionalisation and Aggregates , 2001, PKDD.

[6]  Shusaku Tsumoto,et al.  Foundations of Intelligent Systems, 15th International Symposium, ISMIS 2005, Saratoga Springs, NY, USA, May 25-28, 2005, Proceedings , 2005, ISMIS.

[7]  Michelangelo Ceci,et al.  Discovery of spatial association rules in geo-referenced census data: A relational mining approach , 2003, Intell. Data Anal..

[8]  Hendrik Blockeel,et al.  Multi-Relational Data Mining, Using UML for ILP , 2000, PKDD.

[9]  P. Schönemann On artificial intelligence , 1985, Behavioral and Brain Sciences.

[10]  Donato Malerba,et al.  Mining official data , 2003 .

[11]  Christopher M. Bishop,et al.  Classification and regression , 1997 .

[12]  Saso Dzeroski,et al.  Handling Real Numbers in ILP: A Step Towards Better Behavioural Clones (Extended Abstract) , 1995, ECML.

[13]  Héctor Ariel Leiva,et al.  MRDTL: A multi-relational decision tree learning algorithm , 2002 .

[14]  Stefan Wrobel,et al.  Machine Learning: ECML-95 , 1995, Lecture Notes in Computer Science.

[15]  David Lubinsky Tree Structured Interpretable Regression , 1995, AISTATS.

[16]  Stefan Wrobel,et al.  Inductive Logic Programming for Knowledge Discovery in Databases , 2001 .

[17]  Hendrik Blockeel,et al.  Multi-Relational Data Mining , 2005, Frontiers in Artificial Intelligence and Applications.

[18]  Vasant Honavar,et al.  A Multi-relational Decision Tree Learning Algorithm - Implementation and Experiments , 2003, ILP.

[19]  Jan Komorowski,et al.  Principles of Data Mining and Knowledge Discovery , 2001, Lecture Notes in Computer Science.

[20]  Gerhard Widmer,et al.  Machine Learning: ECML-97 , 1997, Lecture Notes in Computer Science.

[21]  Michelangelo Ceci,et al.  Trading-Off Local versus Global Effects of Regression Nodes in Model Trees , 2002, ISMIS.

[22]  Saso Dzeroski,et al.  Experiments in Predicting Biodegradability , 1999, ILP.

[23]  J. R. Quinlan Learning With Continuous Classes , 1992 .

[24]  Aram Karalic Linear Regression in Regression Tree Leaves , 1992 .

[25]  J. R. Quinlan A Case Study in Machine Learning , 1993 .

[26]  Stefan Kramer,et al.  Structural Regression Trees , 1996, AAAI/IAAI, Vol. 1.

[27]  L. Dublin Vital Statistics. , 1961, British medical journal.

[28]  Shinichi Morishita,et al.  On Classification and Regression , 1998, Discovery Science.

[29]  Michael J. Pazzani,et al.  Relational Clichés: Constraining Induction During Relational Learning , 1991, ML.

[30]  N. Draper,et al.  Applied Regression Analysis. , 1967 .

[31]  Ivan Bratko,et al.  First Order Regression , 1997, Machine Learning.

[32]  Saso Dzeroski,et al.  Inductive Logic Programming: Techniques and Applications , 1993 .

[33]  Arno J. Knobbe,et al.  Multi-relational Decision Tree Induction , 1999, PKDD.

[34]  Vladimir Cherkassky,et al.  Learning from data , 1998 .

[35]  Hendrik Blockeel,et al.  Top-Down Induction of First Order Logical Decision Trees , 1998, AI Commun..

[36]  Michelangelo Ceci,et al.  Top-down induction of model trees with regression and splitting nodes , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.