Interesting Multi-relational Patterns

Mining patterns from multi-relational data is a problem attracting increasing interest within the data mining community. Traditional data mining approaches are typically developed for highly simplified types of data, such as an attribute-value table or a binary database, such that those methods are not directly applicable to multi-relational data. Nevertheless, multi-relational data is a more truthful and therefore often also a more powerful representation of reality. Mining patterns of a suitably expressive syntax directly from this representation, is thus a research problem of great importance. In this paper we introduce a novel approach to mining patterns in multi-relational data. We propose a new syntax for multi-relational patterns as complete connected sub graphs in a representation of the database as a k-partite graph. We show how this pattern syntax is generally applicable to multirelational data, while it reduces to well-known tiles [7] when the data is a simple binary or attribute-value table. We propose RMiner, an efficient algorithm to mine such patterns, and we introduce a method for quantifying their interestingness when contrasted with prior information of the data miner. Finally, we illustrate the usefulness of our approach by discussing results on real-world and synthetic databases.

[1]  Mohammed J. Zaki,et al.  CHARM: An Efficient Algorithm for Closed Itemset Mining , 2002, SDM.

[2]  Ke Wang,et al.  Mining association rules from stars , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[3]  Aristides Gionis,et al.  Assessing data mining results via swap randomization , 2007, TKDD.

[4]  Joost N. Kok,et al.  Efficient Frequent Query Discovery in FARMER , 2003, PKDD.

[5]  Luc De Raedt,et al.  Constraint-Based Pattern Set Mining , 2007, SDM.

[6]  Gemma C. Garriga,et al.  Evaluating Query Result Significance in Databases via Randomizations , 2010, SDM.

[7]  Bart Goethals,et al.  Tiling Databases , 2004, Discovery Science.

[8]  Hannu Toivonen,et al.  Discovery of frequent DATALOG patterns , 1999, Data Mining and Knowledge Discovery.

[9]  Arne Koopman,et al.  Discovering Relational Item Sets Efficiently , 2008 .

[10]  Ramez Elmasri,et al.  Fundamentals of Database Systems , 1989 .

[11]  Ira Assent,et al.  CLICKS: an effective algorithm for mining subspace clusters in categorical datasets , 2005, KDD '05.

[12]  Jiawei Han,et al.  gSpan: graph-based substructure pattern mining , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[13]  Thomas M. Cover,et al.  Elements of Information Theory: Cover/Elements of Information Theory, Second Edition , 2005 .

[14]  George Karypis,et al.  Frequent subgraph discovery , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[15]  Mohammed J. Zaki,et al.  Efficient algorithms for mining closed itemsets and their lattice structure , 2005, IEEE Transactions on Knowledge and Data Engineering.

[16]  Tijl De Bie,et al.  Maximum entropy models and subjective interestingness: an application to tiles in binary databases , 2010, Data Mining and Knowledge Discovery.

[17]  Tijl De Bie,et al.  A framework for mining interesting pattern sets , 2010, UP '10.

[18]  Heikki Mannila,et al.  Tell me something I don't know: randomization strategies for iterative data mining , 2009, KDD.

[19]  Arne Koopman,et al.  Discovering Relational Items Sets Efficiently , 2008, SDM.

[20]  Bart Goethals,et al.  Mining interesting sets and rules in relational databases , 2010, SAC '10.

[21]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[22]  Mohammed J. Zaki,et al.  Theoretical Foundations of Association Rules , 2007 .

[23]  H. C. Johnston Cliques of a graph-variations on the Bron-Kerbosch algorithm , 2004, International Journal of Computer & Information Sciences.

[24]  Jilles Vreeken,et al.  Item Sets that Compress , 2006, SDM.

[25]  Arne Koopman Characteristic relational patterns , 2009, KDD.

[26]  Howard J. Hamilton,et al.  Interestingness measures for data mining: A survey , 2006, CSUR.