P-N-RMiner: a generic framework for mining interesting structured relational patterns

Methods for local pattern mining are fragmented along two dimensions: the pattern syntax, and the data types on which they are applicable. Pattern syntaxes include subgroups, n-sets, itemsets, and many more; common data types include binary, categorical, and real-valued. Recent research on relational pattern mining has shown how the aforementioned pattern syntaxes can be unified in a single framework. However, a unified model to deal with various data types is lacking, certainly for more complexly structured types such as real numbers, time of day—which is circular—, geographical location, terms from a taxonomy, etc. We introduce P-N-RMiner, a generic tool for mining interesting local patterns in (relational) data with structured attributes. We show how to handle the attribute structures in a generic manner, by modelling them as partial orders. We also derive an information-theoretic subjective interestingness measure for such patterns and present an algorithm to efficiently enumerate the patterns. We find that (1) P-N-RMiner finds patterns that are substantially more informative, (2) the new interestingness measure cannot be approximated using existing methods, and (3) we can leverage the partial orders to speed up enumeration.

[1]  Tijl De Bie,et al.  Interesting pattern mining in multi-relational data , 2013, Data Mining and Knowledge Discovery.

[2]  Bart Goethals,et al.  Tiling Databases , 2004, Discovery Science.

[3]  Eirini Spyropoluou Local pattern mining in multi-relational data , 2013 .

[4]  Jean-François Boulicaut,et al.  Data Peeler: Contraint-Based Closed Pattern Mining in n-ary Relations , 2008, SDM.

[5]  Christian Borgelt,et al.  Frequent item set mining , 2012, WIREs Data Mining Knowl. Discov..

[6]  Stefan Wrobel,et al.  Listing closed sets of strongly accessible set systems with applications to data , 2010, LWA.

[7]  Jilles Vreeken,et al.  Krimp: mining itemsets that compress , 2011, Data Mining and Knowledge Discovery.

[8]  Chengqi Zhang,et al.  Combined Mining: Discovering Informative Knowledge in Complex Data , 2011, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[9]  Panagiotis Papapetrou,et al.  A statistical significance testing approach to mining the most informative set of patterns , 2012, Data Mining and Knowledge Discovery.

[10]  Philip S. Yu,et al.  Domain Driven Data Mining , 2015 .

[11]  Tijl De Bie,et al.  Mining approximate multi-relational patterns , 2014, 2014 International Conference on Data Science and Advanced Analytics (DSAA).

[12]  Tijl De Bie,et al.  Mining Interesting Patterns in Multi-relational Data with N-ary Relationships , 2013, Discovery Science.

[13]  Aristides Gionis,et al.  Assessing data mining results via swap randomization , 2007, TKDD.

[14]  Wouter Duivesteijn,et al.  Exceptional Model Mining , 2008, Data Mining and Knowledge Discovery.

[15]  Tijl De Bie,et al.  Knowledge discovery interestingness measures based on unexpectedness , 2012, WIREs Data Mining Knowl. Discov..

[16]  Siegfried Nijssen,et al.  Efficient Algorithms for Finding Richer Subgroup Descriptions in Numeric and Nominal Data , 2012, 2012 IEEE 12th International Conference on Data Mining.

[17]  Tijl De Bie,et al.  An information theoretic framework for data mining , 2011, KDD.

[18]  Longbing Cao Domain Driven Data Mining (D3M) , 2008, 2008 IEEE International Conference on Data Mining Workshops.

[19]  Longbing Cao,et al.  Domain-Driven Data Mining: Challenges and Prospects , 2010, IEEE Transactions on Knowledge and Data Engineering.

[20]  Hao Wu,et al.  Uncovering the plot: detecting surprising coalitions of entities in multi-relational schemas , 2014, Data Mining and Knowledge Discovery.

[21]  María José del Jesús,et al.  An overview on subgroup discovery: foundations and applications , 2011, Knowledge and Information Systems.

[22]  Heikki Mannila,et al.  Randomization of real-valued matrices for assessing the significance of data mining results , 2008, SDM.

[23]  Tijl De Bie,et al.  Subjective Interestingness in Exploratory Data Mining , 2013, IDA.

[24]  Kyumin Lee,et al.  Exploring Millions of Footprints in Location Sharing Services , 2011, ICWSM.

[25]  Akira Tanaka,et al.  The worst-case time complexity for generating all maximal cliques and computational experiments , 2006, Theor. Comput. Sci..

[26]  Tijl De Bie,et al.  Maximum entropy models and subjective interestingness: an application to tiles in binary databases , 2010, Data Mining and Knowledge Discovery.

[27]  Hendrik Blockeel,et al.  Multi-Relational Data Mining , 2005, Frontiers in Artificial Intelligence and Applications.