Exploring multi-relational temporal databases with a propositional sequence miner

In this work, we introduce the MuSer, a propositional framework that explores temporal information available in multi-relational databases. At the core of this system is an encoding technique that translates the temporal information into a propositional sequence of events. By using this technique, we are able to explore the temporal information using a propositional sequence miner. With this framework, we mine each class partition individually and we do not use classical aggregation strategies, like window aggregation. Moreover, in this system we combine feature selection and propositionalization techniques to cast a multi-relational classification problem into a propositional one. We empirically evaluate the MuSer framework using two real databases. The results show that mining each partition individually is a time- and memory-efficient strategy that generates a high number of highly discriminative patterns.

[1]  Johannes Fürnkranz,et al.  Round Robin Classification , 2002, J. Mach. Learn. Res..

[2]  Ramakrishnan Srikant,et al.  Fast algorithms for mining association rules , 1998, VLDB 1998.

[3]  Floriana Esposito,et al.  Ensemble Relational Learning based on Selective Propositionalization , 2013, ArXiv.

[4]  Jesse Davis,et al.  An Integrated Approach to Learning Bayesian Networks of Rules , 2005, ECML.

[5]  Roberto J. Bayardo,et al.  Efficiently mining long patterns from databases , 1998, SIGMOD '98.

[6]  Michael Baron Probability and Statistics for Computer Scientists, Second Edition , 2013 .

[7]  L. De Raedt,et al.  Logical Hidden Markov Models , 2011, J. Artif. Intell. Res..

[8]  Mohammed J. Zaki Sequence mining in categorical domains: incorporating constraints , 2000, CIKM '00.

[9]  Antonio Gomariz,et al.  VMSP: Efficient Vertical Mining of Maximal Sequential Patterns , 2014, Canadian Conference on AI.

[10]  João Gama,et al.  RUSE-WARMR: Rule Selection for Classifier Induction in Multi-relational Data-Sets , 2008, 2008 20th IEEE International Conference on Tools with Artificial Intelligence.

[11]  Mohammed J. Zaki,et al.  Mining features for sequence classification , 1999, KDD '99.

[12]  Marcel Abendroth,et al.  Data Mining Practical Machine Learning Tools And Techniques With Java Implementations , 2016 .

[13]  Qiming Chen,et al.  PrefixSpan,: mining sequential patterns efficiently by prefix-projected pattern growth , 2001, Proceedings 17th International Conference on Data Engineering.

[14]  Florence Le Ber,et al.  RCA as a Data Transforming Method: A Comparison with Propositionalisation , 2014, ICFCA.

[15]  Peter A. Flach,et al.  Comparative Evaluation of Approaches to Propositionalization , 2003, ILP.

[16]  Luc De Raedt,et al.  Molecular feature mining in HIV data , 2001, KDD '01.

[17]  Tadashi Horiuchi,et al.  Graph-Based Induction for General Graph Structured Data , 1999, IFIP Working Conference on Database Semantics.

[18]  Takashi Washio,et al.  Analysis of Hepatitis Dataset by Decision Tree Graph-Based Induction , 2004 .

[19]  J. Platt Sequential Minimal Optimization : A Fast Algorithm for Training Support Vector Machines , 1998 .

[20]  Luc De Raedt,et al.  Logical and relational learning , 2008, Cognitive Technologies.

[21]  Stefano Ferilli,et al.  Optimizing Probabilistic Models for Relational Sequence Learning , 2011, ISMIS.

[22]  Luc De Raedt,et al.  Constraint Based Mining of First Order Sequences in SeqLog , 2004, Database Support for Data Mining Applications.

[23]  Xifeng Yan,et al.  CloSpan: Mining Closed Sequential Patterns in Large Datasets , 2003, SDM.

[24]  Ramakrishnan Srikant,et al.  Mining sequential patterns , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[25]  Hannu Toivonen,et al.  Discovery of frequent DATALOG patterns , 1999, Data Mining and Knowledge Discovery.

[26]  Nada Lavrac,et al.  Propositionalization-based relational subgroup discovery with RSD , 2006, Machine Learning.

[27]  Michèle Sebag,et al.  Scalability and efficiency in multi-relational data mining , 2003, SKDD.

[28]  João Gama,et al.  Predictive Sequence Miner in ILP Learning , 2011, ILP.

[29]  Ramakrishnan Srikant,et al.  Mining Sequential Patterns: Generalizations and Performance Improvements , 1996, EDBT.

[30]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.