Towards a Framework for Semantic Exploration of Frequent Patterns

Mining frequent patterns is an essential task in discovering hidden correlations in datasets. Although frequent patterns unveil valuable information, there are some challenges which limits their usability. First, the number of possible patterns is often very large which hinders their eff ective exploration. Second, patterns with many items are hard to read and the analyst may be unable to understand their meaning. In addition, the only available information about patterns is their support, a very coarse piece of information. In this paper, we are particularly interested in mining datasets that reflect usage patterns of users moving in space and time and for whom demographics attributes are available (age, occupation, etc). Such characteristics are typical of data collected from smart phones, whose analysis has critical business applications nowadays. We propose pattern exploration primitives, abstraction and refinement, that use hand-crafted taxonomies on time, space and user demographics. We show on two real datasets, Nokia and MovieLens, how the use of such taxonomies reduces the size of the pattern space and how demographics enable their semantic exploration. This work opens new perspectives in the semantic exploration of frequent patterns that reflect the behavior of di fferent user communities.

[1]  Cong Yu,et al.  MRI: Meaningful Interpretations of Collaborative Ratings , 2011, Proc. VLDB Endow..

[2]  Tijl De Bie,et al.  Maximum entropy models and subjective interestingness: an application to tiles in binary databases , 2010, Data Mining and Knowledge Discovery.

[3]  Hiroki Arimura,et al.  LCM ver. 2: Efficient Mining Algorithms for Frequent/Closed/Maximal Itemsets , 2004, FIMI.

[4]  Tijl De Bie,et al.  A framework for mining interesting pattern sets , 2010, UP '10.

[5]  Gregory Piatetsky-Shapiro,et al.  An Application of KEFM to the Analysis of Healthcare Information , 1994, KDD Workshop.

[6]  Ramakrishnan Srikant,et al.  Mining generalized association rules , 1995, Future Gener. Comput. Syst..

[7]  Jaideep Srivastava,et al.  Selecting the right interestingness measure for association patterns , 2002, KDD.

[8]  Hiroki Arimura,et al.  LCM: An Efficient Algorithm for Enumerating Frequent Closed Item Sets , 2003, FIMI.

[9]  Jiawei Han,et al.  Discovering interesting patterns through user's interactive feedback , 2006, KDD '06.

[10]  Dino Pedreschi,et al.  ExAnte: Anticipated Data Reduction in Constrained Pattern Mining , 2003, PKDD.

[11]  Fabrice Guillet,et al.  Post-Processing of Discovered Association Rules Using Ontologies , 2008, 2008 IEEE International Conference on Data Mining Workshops.

[12]  Howard J. Hamilton,et al.  Interestingness measures for data mining: A survey , 2006, CSUR.

[13]  Daniel Kifer,et al.  DualMiner: A Dual-Pruning Algorithm for Itemsets with Constraints , 2002, Data Mining and Knowledge Discovery.

[14]  Jilles Vreeken,et al.  Tell me what i need to know: succinctly summarizing data with itemsets , 2011, KDD.

[15]  Gregory Piatetsky,et al.  Selecting and Reporting What is Interesting � The KEFIR Application to Healthcare Data , 2004 .

[16]  Bart Goethals,et al.  MIME: a framework for interactive visual pattern mining , 2011, KDD.

[17]  Cong Yu,et al.  Who Tags What? An Analysis Framework , 2012, Proc. VLDB Endow..