PRIIME: A generic framework for interactive personalized interesting pattern discovery

The traditional frequent pattern mining algorithms generate an exponentially large number of patterns of which a substantial proportion are not much significant for many data analysis endeavors. Discovery of a small number of personalized interesting patterns from the large output set according to a particular user's interest is an important as well as challenging task. Existing works on pattern summarization do not solve this problem from the personalization viewpoint. In this work, we propose an interactive pattern discovery framework named PRIIME which identifies a set of interesting patterns for a specific user without requiring any prior input on the interestingness measure of patterns from the user. The proposed framework is generic to support discovery of the interesting set, sequence and graph type patterns. We develop a softmax classification based iterative learning algorithm that uses a limited number of interactive feedback from the user to learn her interestingness profile, and use this profile for pattern recommendation. To handle sequence and graph type patterns PRIIME adopts a neural net (NN) based unsupervised feature construction approach. We also develop a strategy that combines exploration and exploitation to select patterns for feedback. We show experimental results on several real-life datasets to validate the performance of the proposed method. We also compare with the existing methods of interactive pattern discovery to show that our method is substantially superior in performance. To portray the applicability of the framework, we present a case study from the real-estate domain.

[1]  Mark Craven,et al.  Multiple-Instance Active Learning , 2007, NIPS.

[2]  Nick Cramer,et al.  Automatic Keyword Extraction from Individual Documents , 2010 .

[3]  Christian Penner The Digital House Hunt: Consumer and Market Trends in Real Estate , 2013 .

[4]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Jeffrey Pennington,et al.  Dynamic Pooling and Unfolding Recursive Autoencoders for Paraphrase Detection , 2011, NIPS.

[6]  Steven Skiena,et al.  DeepWalk: online learning of social representations , 2014, KDD.

[7]  Stefan Wrobel,et al.  One click mining: interactive local pattern discovery through implicit preference and performance learning , 2013, IDEA@KDD.

[8]  Snehasis Mukhopadhyay,et al.  Interactive pattern mining on hidden data: a sampling-based solution , 2012, CIKM.

[9]  Burr Settles,et al.  Active Learning Literature Survey , 2009 .

[10]  Geng Li,et al.  Effective graph classification based on topological and label attributes , 2012, Stat. Anal. Data Min..

[11]  Patrice Y. Simard,et al.  Best practices for convolutional neural networks applied to visual document analysis , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[12]  Antonio Gomariz,et al.  SPMF: a Java open-source pattern mining library , 2014, J. Mach. Learn. Res..

[13]  Tijl De Bie,et al.  Maximum entropy models and subjective interestingness: an application to tiles in binary databases , 2010, Data Mining and Knowledge Discovery.

[14]  Hiroki Arimura,et al.  LCM ver. 2: Efficient Mining Algorithms for Frequent/Closed/Maximal Itemsets , 2004, FIMI.

[15]  Petr Sojka,et al.  Software Framework for Topic Modelling with Large Corpora , 2010 .

[16]  Matthijs van Leeuwen,et al.  Interactive Discovery of Interesting Subgroup Sets , 2013, IDA.

[17]  Quoc V. Le,et al.  Distributed Representations of Sentences and Documents , 2014, ICML.

[18]  Mohammad Al Hasan,et al.  Output Space Sampling for Graph Patterns , 2009, Proc. VLDB Endow..

[19]  Jiawei Han,et al.  Discovering interesting patterns through user's interactive feedback , 2006, KDD '06.

[20]  Xifeng Yan,et al.  CloSpan: Mining Closed Sequential Patterns in Large Datasets , 2003, SDM.

[21]  Pauli Miettinen,et al.  Siren: an interactive tool for mining and visualizing geospatial redescriptions , 2012, KDD.

[22]  Alexandre Termier,et al.  Interactive User Group Analysis , 2015, CIKM.

[23]  Marc Boullé,et al.  Exploration vs. exploitation in active learning : A Bayesian approach , 2010, The 2010 International Joint Conference on Neural Networks (IJCNN).

[24]  Luc De Raedt,et al.  Don't Be Afraid of Simpler Patterns , 2006, PKDD.

[25]  Ralf Klinkenberg,et al.  Data Classification: Algorithms and Applications , 2014 .

[26]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[27]  Mohammed J. Zaki Data Mining and Analysis: Fundamental Concepts and Algorithms , 2014 .

[28]  Jilles Vreeken,et al.  Summarizing data succinctly with the most informative itemsets , 2012, TKDD.

[29]  Jürgen Schmidhuber,et al.  Multi-column deep neural networks for image classification , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[30]  Jilles Vreeken,et al.  Krimp: mining itemsets that compress , 2011, Data Mining and Knowledge Discovery.

[31]  Bart Goethals,et al.  MIME: a framework for interactive visual pattern mining , 2011, KDD.

[32]  Teofilo F. GONZALEZ,et al.  Clustering to Minimize the Maximum Intercluster Distance , 1985, Theor. Comput. Sci..

[33]  Jilles Vreeken,et al.  Tell me what i need to know: succinctly summarizing data with itemsets , 2011, KDD.

[34]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[35]  Boris Cule,et al.  Itemset Based Sequence Classification , 2013, ECML/PKDD.