A Theoretical Framework for Exploratory Data Mining: Recent Insights and Challenges Ahead

Exploratory Data Mining (EDM), the contemporary heir of Exploratory Data Analysis (EDA) pioneered by Tukey in the seventies, is the task of facilitating the extraction of interesting nuggets of information from possibly large and complexly structured data. Major conceptual challenges in EDM research are the understanding of how one can formalise a nugget of information (given the diversity of types of data of interest), and how one can formalise how interesting such a nugget of information is to a particular user (given the diversity of types of users and intended purposes). In this Nectar paper we briefly survey a number of recent contributions made by us and collaborators towards a theoretically motivated and practically usable resolution of these challenges.

[1]  John W. Tukey,et al.  A Projection Pursuit Algorithm for Exploratory Data Analysis , 1974, IEEE Transactions on Computers.

[2]  John W. Tukey,et al.  Exploratory Data Analysis. , 1979 .

[3]  Robin Sibson,et al.  What is projection pursuit , 1987 .

[4]  Heikki Mannila,et al.  Theoretical frameworks for data mining , 2000, SKDD.

[5]  Bart Goethals,et al.  Tiling Databases , 2004, Discovery Science.

[6]  J. Winderickx,et al.  Inferring transcriptional modules from ChIP-chip, motif and microarray data , 2006, Genome Biology.

[7]  Christos Faloutsos,et al.  On data mining, compression, and Kolmogorov complexity , 2007, Data Mining and Knowledge Discovery.

[8]  Jilles Vreeken,et al.  Item Sets that Compress , 2006, SDM.

[9]  Aristides Gionis,et al.  Assessing data mining results via swap randomization , 2007, TKDD.

[10]  J. Collado-Vides,et al.  Method DISTILLER : a data integration framework to reveal condition dependency of complex regulons in Escherichia coli , 2009 .

[11]  Heikki Mannila,et al.  Tell me something I don't know: randomization strategies for iterative data mining , 2009, KDD.

[12]  Tijl De Bie,et al.  An Information-Theoretic Approach to Finding Informative Noisy Tiles in Binary Databases , 2010, SDM.

[13]  Tijl De Bie,et al.  Maximum entropy models and subjective interestingness: an application to tiles in binary databases , 2010, Data Mining and Knowledge Discovery.

[14]  Tijl De Bie,et al.  A framework for mining interesting pattern sets , 2010, UP '10.

[15]  Nello Cristianini,et al.  The 17th ACM SIGKDD conference on Knowledge Discovery and Data Mining (KDD) , 2011 .

[16]  Tijl De Bie,et al.  Interesting Multi-relational Patterns , 2011, 2011 IEEE 11th International Conference on Data Mining.

[17]  Tijl De Bie,et al.  An information theoretic framework for data mining , 2011, KDD.

[18]  Tijl De Bie Subjectively Interesting Alternative Clusters , 2011, MultiClust@ECML/PKDD.

[19]  Tijl De Bie,et al.  Maximum Entropy Modelling for Assessing Results on Real-Valued Data , 2011, 2011 IEEE 11th International Conference on Data Mining.

[20]  Panagiotis Papapetrou,et al.  A statistical significance testing approach to mining the most informative set of patterns , 2012, Data Mining and Knowledge Discovery.

[21]  Tijl De Bie,et al.  Knowledge discovery interestingness measures based on unexpectedness , 2012, WIREs Data Mining Knowl. Discov..

[22]  Tijl De Bie,et al.  Formalizing Complex Prior Information to Quantify Subjective Interestingness of Frequent Pattern Sets , 2012, IDA.

[23]  Tijl De Bie,et al.  Mining Interesting Patterns in Multi-relational Data with N-ary Relationships , 2013, Discovery Science.

[24]  Tijl De Bie,et al.  Interesting pattern mining in multi-relational data , 2013, Data Mining and Knowledge Discovery.

[25]  Tijl De Bie,et al.  Subjectively interesting alternative clusterings , 2013, Machine Learning.

[26]  Tijl De Bie,et al.  Mining Interesting Patterns in Multi-Relational Data , 2013 .