Direct Mining of Subjectively Interesting Relational Patterns

Data is typically complex and relational. Therefore, the development of relational data mining methods is an increasingly active topic of research. Recent work has resulted in new formalisations of patterns in relational data and in a way to quantify their interestingness in a subjective manner, taking into account the data analyst's prior beliefs about the data. Yet, a scalable algorithm to find such most interesting patterns is lacking. We introduce a new algorithm based on two notions: (1) the use of Constraint Programming, which results in a notably shorter development time, faster runtimes, and more flexibility for extensions such as branch-and-bound search, and (2), the direct search for the most interesting patterns only, instead of exhaustive enumeration of patterns before ranking them. Through empirical evaluation, we find that our novel bounds yield speedups up to several orders of magnitude, especially on dense data with a simple schema. This makes it possible to mine the most subjectively-interesting relational patterns present in databases where this was previously impractical or impossible.

[1]  Tias Guns,et al.  Constraint-Based Pattern Mining in Multi-relational Databases , 2011, 2011 IEEE 11th International Conference on Data Mining Workshops.

[2]  Tijl De Bie,et al.  Mining Interesting Patterns in Multi-relational Data with N-ary Relationships , 2013, Discovery Science.

[3]  Kyumin Lee,et al.  Exploring Millions of Footprints in Location Sharing Services , 2011, ICWSM.

[4]  Luc De Raedt,et al.  Constraint programming for itemset mining , 2008, KDD.

[5]  Tijl De Bie,et al.  An information theoretic framework for data mining , 2011, KDD.

[6]  Tijl De Bie,et al.  Interesting pattern mining in multi-relational data , 2013, Data Mining and Knowledge Discovery.

[7]  Hiroki Arimura,et al.  An Efficient Algorithm for Enumerating Closed Patterns in Transaction Databases , 2004, Discovery Science.

[8]  Tijl De Bie,et al.  P-N-RMiner: a generic framework for mining interesting structured relational patterns , 2015, 2015 IEEE International Conference on Data Science and Advanced Analytics (DSAA).