Itemset-Based Variable Construction in Multi-relational Supervised Learning

In multi-relational data mining, data are represented in a relational form where the individuals of the target table are potentially related to several records in secondary tables in one-to-many relationship. In this paper, we introduce an itemset based framework for constructing variables in secondary tables and evaluating their conditional information for the supervised classification task. We introduce a space of itemset based models in the secondary table and conditional density estimation of the related constructed variables. A prior distribution is defined on this model space, resulting in a parameter-free criterion to assess the relevance of the constructed variables. A greedy algorithm is then proposed in order to explore the space of the considered itemsets. Experiments on multi-relationalal datasets confirm the advantage of the approach.

[1]  Peter A. Flach,et al.  Naive Bayesian Classification of Structured Data , 2004, Machine Learning.

[2]  Michelangelo Ceci,et al.  Mr-SBC: A Multi-relational Naïve Bayes Classifier , 2003, PKDD.

[3]  Luc De Raedt,et al.  Using Logical Decision Trees for Clustering , 1997, ILP.

[4]  Marc Boullé,et al.  A Bayesian Approach for Classification Rule Mining in Quantitative Databases , 2012, ECML/PKDD.

[5]  Dominique Laurent,et al.  Informative Variables Selection for Multi-relational Supervised Learning , 2011, MLDM.

[6]  S. Džeroski,et al.  Relational Data Mining , 2001, Springer Berlin Heidelberg.

[7]  Bart Goethals,et al.  Mining interesting sets and rules in relational databases , 2010, SAC '10.

[8]  Saso Dzeroski,et al.  Diterpene Structure Elucidation from 13CNMR Spectra with Inductive Logic Programming , 1998, Appl. Artif. Intell..

[9]  Nandit Soparkar,et al.  Frequent Itemset Counting Across Multiple Tables , 2000, PAKDD.

[10]  Peter A. Flach,et al.  A first-order representation for knowledge discovery and Bayesian classification on relational data , 2007 .

[11]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[12]  Luc De Raedt,et al.  Mining Association Rules in Multiple Relations , 1997, ILP.

[13]  Luc De Raedt,et al.  Attribute-Value Learning Versus Inductive Logic Programming: The Missing Links (Extended Abstract) , 1998, ILP.

[14]  Stefan Wrobel,et al.  Transformation-Based Learning Using Multirelational Aggregation , 2001, ILP.

[15]  Peter A. Flach,et al.  Propositionalization approaches to relational data mining , 2001 .

[16]  Luís Torgo,et al.  Knowledge Discovery in Databases: PKDD 2005, 9th European Conference on Principles and Practice of Knowledge Discovery in Databases, Porto, Portugal, October 3-7, 2005, Proceedings , 2005, PKDD.

[17]  Bart Goethals,et al.  Mining frequent conjunctive queries in relational databases through dependency discovery , 2012, Knowledge and Information Systems.

[18]  Luc De Raedt,et al.  On Multi-class Problems and Discretization in Inductive Logic Programming , 1997, ISMIS.

[19]  Mohammad H. Poursaeidi,et al.  Robust support vector machines for multiple instance learning , 2012, Annals of Operations Research.

[20]  Marc Boullé,et al.  Optimum simultaneous discretization with data grid models in supervised classification: a Bayesian model selection approach , 2009, Adv. Data Anal. Classif..

[21]  Jan Rauch,et al.  Lessons Learned from the ECML/PKDD Discovery Challenge on the Atherosclerosis Risk Factors Data , 2007, Comput. Informatics.

[22]  Michelangelo Ceci,et al.  Spatial associative classification: propositional vs structural approach , 2006, Journal of Intelligent Information Systems.

[23]  Bo Hu,et al.  MrCAR: A Multi-relational Classification Algorithm Based on Association Rules , 2009, 2009 International Conference on Web Information Systems and Mining.

[24]  Michelangelo Ceci,et al.  Emerging Pattern Based Classification in Relational Data Mining , 2008, DEXA.

[25]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[26]  Troels Andreasen,et al.  Foundations of Intelligent Systems , 2014, Lecture Notes in Computer Science.

[27]  Hendrik Blockeel,et al.  Multi-Relational Data Mining , 2005, Frontiers in Artificial Intelligence and Applications.

[28]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[29]  Peter A. Flach,et al.  Evaluation Measures for Multi-class Subgroup Discovery , 2009, ECML/PKDD.

[30]  Thomas Hofmann,et al.  Multi-Instance Multi-Label Learning with Application to Scene Classification , 2007 .

[31]  Jingfeng Guo,et al.  Multi-relational Association Rule Mining with Guidance of User , 2007, Fourth International Conference on Fuzzy Systems and Knowledge Discovery (FSKD 2007).

[32]  Rayner Alfred Discretization Numerical Data for Relational Data with One-to-Many Relations , 2009 .

[33]  Hendrik Blockeel,et al.  Knowledge Discovery in Databases: PKDD 2003 , 2003, Lecture Notes in Computer Science.

[34]  Arno J. Knobbe,et al.  Numbers in Multi-relational Data Mining , 2005, PKDD.

[35]  Thomas Hofmann,et al.  Support Vector Machines for Multiple-Instance Learning , 2002, NIPS.

[36]  Ashwin Srinivasan,et al.  Mutagenesis: ILP experiments in a non-determinate biological domain , 1994 .

[37]  Wenfei Fan,et al.  Keys with Upward Wildcards for XML , 2001, DEXA.

[38]  Hendrik Blockeel,et al.  Multi-Relational Data Mining, Using UML for ILP , 2000, PKDD.

[39]  Joost N. Kok,et al.  Faster Association Rules for Multiple Relations , 2001, IJCAI.

[40]  Marc Boullé,et al.  Compression-Based Averaging of Selective Naive Bayes Classifiers , 2007, J. Mach. Learn. Res..

[41]  J. Rissanen A UNIVERSAL PRIOR FOR INTEGERS AND ESTIMATION BY MINIMUM DESCRIPTION LENGTH , 1983 .