Predictive Sequence Miner in ILP Learning

This work presents an optimized version of XMuSer, an ILP based framework suitable to explore temporal patterns available in multi-relational databases. XMuSer's main idea consists of exploiting frequent sequence mining, an efficient method to learn temporal patterns in the form of sequences. XMuSer framework efficiency is grounded on a new coding methodology for temporal data and on the use of a predictive sequence miner. The frameworks selects and map the most interesting sequential patterns into a new table, the sequence relation. In the last step of our framework, we use an ILP algorithm to learn a classification theory on the enlarged relational database that consists of the original multi-relational database and the new sequence relation. We evaluate our framework by addressing three classification problems and map each one of three different types of sequential patterns: frequent, closed or maximal. The experiments show that our ILP based framework gains both from the descriptive power of the ILP algorithms and the efficiency of the sequential miners.

[1]  João Gama,et al.  RUSE-WARMR: Rule Selection for Classifier Induction in Multi-relational Data-Sets , 2008, 2008 20th IEEE International Conference on Tools with Artificial Intelligence.

[2]  Luc De Raedt,et al.  Constraint Based Mining of First Order Sequences in SeqLog , 2004, Database Support for Data Mining Applications.

[3]  Xifeng Yan,et al.  CloSpan: Mining Closed Sequential Patterns in Large Datasets , 2003, SDM.

[4]  João Gama,et al.  Sequential Pattern Mining in Multi-relational Datasets , 2009, CAEPIA.

[5]  Daniel Barbará,et al.  Proceedings of the Third SIAM International Conference on Data Mining, San Francisco, CA, USA, May 1-3, 2003 , 2003, SDM.

[6]  Jesse Davis,et al.  View Learning for Statistical Relational Learning: With an Application to Mammography , 2005, IJCAI.

[7]  Takashi Washio,et al.  Analysis of Hepatitis Dataset by Decision Tree Graph-Based Induction , 2004 .

[8]  Vítor Santos Costa The Life of a Logic Programming System , 2008, ICLP.

[9]  Kyuseok Shim,et al.  Mining Sequential Patterns with Regular Expression Constraints , 2002, IEEE Trans. Knowl. Data Eng..

[10]  R. Mike Cameron-Jones,et al.  Induction of logic programs: FOIL and related systems , 1995, New Generation Computing.

[11]  Geoffrey I. Webb,et al.  Supervised Descriptive Rule Discovery: A Unifying Survey of Contrast Set, Emerging Pattern and Subgroup Mining , 2009, J. Mach. Learn. Res..

[12]  Alberto Bugarín,et al.  Current Topics in Artificial Intelligence, 11th Conference of the Spanish Association for Artificial Intelligence, CAEPIA 2005, Santiago de Compostela, Spain, November 16-18, 2005, Revised Selected Papers , 2006, CAEPIA.

[13]  Stephen Muggleton,et al.  Efficient Induction of Logic Programs , 1990, ALT.

[14]  João Gama,et al.  Constrained Sequential Pattern Knowledge in Multi-relational Learning , 2011, EPIA.

[15]  Hannu Toivonen,et al.  Discovery of frequent DATALOG patterns , 1999, Data Mining and Knowledge Discovery.

[16]  Mohammed J. Zaki,et al.  SPADE: An Efficient Algorithm for Mining Frequent Sequences , 2004, Machine Learning.

[17]  Nada Lavrac,et al.  Propositionalization-based relational subgroup discovery with RSD , 2006, Machine Learning.

[18]  Georges Gardarin,et al.  Advances in Database Technology — EDBT '96 , 1996, Lecture Notes in Computer Science.

[19]  Stefano Ferilli,et al.  Multi-Dimensional Relational Sequence Mining , 2008, Fundam. Informaticae.

[20]  Pier Luca Lanzi,et al.  Database Support for Data Mining Applications , 2004, Lecture Notes in Computer Science.

[21]  Mohammed J. Zaki Sequence mining in categorical domains: incorporating constraints , 2000, CIKM '00.

[22]  Michèle Sebag,et al.  Scalability and efficiency in multi-relational data mining , 2003, SKDD.

[23]  Ramakrishnan Srikant,et al.  Mining Sequential Patterns: Generalizations and Performance Improvements , 1996, EDBT.

[24]  Krzysztof R. Apt,et al.  Logic Programming , 1990, Handbook of Theoretical Computer Science, Volume B: Formal Models and Sematics.

[25]  Stephen Muggleton,et al.  Inverse entailment and progol , 1995, New Generation Computing.