Subjective Interestingness in Exploratory Data Mining

Exploratory data mining has as its aim to assist a user in improving their understanding about the data. Considering this aim, it seems self-evident that in optimizing this process the data as well as the user need to be considered. Yet, the vast majority of exploratory data mining methods including most methods for clustering, itemset and association rule mining, subgroup discovery, dimensionality reduction, etc formalize interestingness of patterns in an objective manner, disregarding the user altogether. More often than not this leads to subjectively uninteresting patterns being reported. Here I will discuss a general mathematical framework for formalizing interestingness in a subjective manner. I will further demonstrate how it can be successfully instantiated for a variety of exploratory data mining problems. Finally, I will highlight some connections to other work, and outline some of the challenges and research opportunities ahead.

[1]  John W. Tukey,et al.  A Projection Pursuit Algorithm for Exploratory Data Analysis , 1974, IEEE Transactions on Computers.

[2]  Robin Sibson,et al.  What is projection pursuit , 1987 .

[3]  Abraham Silberschatz,et al.  On Subjective Measures of Interestingness in Knowledge Discovery , 1995, KDD.

[4]  Balaji Padmanabhan,et al.  A Belief-Driven Method for Discovering Unexpected Patterns , 1998, KDD.

[5]  Bart Goethals,et al.  Tiling Databases , 2004, Discovery Science.

[6]  Jilles Vreeken,et al.  Item Sets that Compress , 2006, SDM.

[7]  Howard J. Hamilton,et al.  Interestingness measures for data mining: A survey , 2006, CSUR.

[8]  Geoffrey I. Webb Discovering Significant Patterns , 2007, Machine Learning.

[9]  P. Grünwald The Minimum Description Length Principle (Adaptive Computation and Machine Learning) , 2007 .

[10]  Aristides Gionis,et al.  Assessing data mining results via swap randomization , 2007, TKDD.

[11]  Heikki Mannila,et al.  Tell me something I don't know: randomization strategies for iterative data mining , 2009, KDD.

[12]  Tijl De Bie,et al.  An Information-Theoretic Approach to Finding Informative Noisy Tiles in Binary Databases , 2010, SDM.

[13]  Tijl De Bie,et al.  Maximum entropy models and subjective interestingness: an application to tiles in binary databases , 2010, Data Mining and Knowledge Discovery.

[14]  Geoffrey I. Webb Filtered‐top‐k association discovery , 2011, Wiley Interdiscip. Rev. Data Min. Knowl. Discov..

[15]  Nello Cristianini,et al.  The 17th ACM SIGKDD conference on Knowledge Discovery and Data Mining (KDD) , 2011 .

[16]  Tijl De Bie,et al.  Interesting Multi-relational Patterns , 2011, 2011 IEEE 11th International Conference on Data Mining.

[17]  Tijl De Bie,et al.  An information theoretic framework for data mining , 2011, KDD.

[18]  Tijl De Bie,et al.  Maximum Entropy Modelling for Assessing Results on Real-Valued Data , 2011, 2011 IEEE 11th International Conference on Data Mining.

[19]  Tijl De Bie,et al.  Knowledge discovery interestingness measures based on unexpectedness , 2012, WIREs Data Mining Knowl. Discov..

[20]  Tijl De Bie,et al.  Formalizing Complex Prior Information to Quantify Subjective Interestingness of Frequent Pattern Sets , 2012, IDA.

[21]  Tijl De Bie,et al.  Mining Interesting Patterns in Multi-relational Data with N-ary Relationships , 2013, Discovery Science.

[22]  Tijl De Bie,et al.  Interesting pattern mining in multi-relational data , 2013, Data Mining and Knowledge Discovery.

[23]  Tijl De Bie,et al.  Subjectively interesting alternative clusterings , 2013, Machine Learning.

[24]  Tijl De Bie,et al.  Maximum Entropy Models for Iteratively Identifying Subjectively Interesting Structure in Real-Valued Data , 2013, ECML/PKDD.