Interactive User Group Analysis

User data is becoming increasingly available in multiple domains ranging from phone usage traces to data on the social Web. The analysis of user data is appealing to scientists who work on population studies, recommendations, and large-scale data analytics. We argue for the need for an interactive analysis to understand the multiple facets of user data and address different analytics scenarios. Since user data is often sparse and noisy, we propose to produce labeled groups that describe users with common properties and develop IUGA, an interactive framework based on group discovery primitives to explore the user space. At each step of IUGA, an analyst visualizes group members and may take an action on the group (add/remove members) and choose an operation (exploit/explore) to discover more groups and hence more users. Each discovery operation results in k most relevant and diverse groups. We formulate group exploitation and exploration as optimization problems and devise greedy algorithms to enable efficient group discovery. Finally, we design a principled validation methodology and run extensive experiments that validate the effectiveness of IUGA on large datasets for different user space analysis scenarios.

[1]  Naren Ramakrishnan,et al.  Redescription Mining: Structure Theory and Algorithms , 2005, AAAI.

[2]  Snehasis Mukhopadhyay,et al.  Interactive pattern mining on hidden data: a sampling-based solution , 2012, CIKM.

[3]  Nir Friedman,et al.  Discretizing Continuous Attributes While Learning Bayesian Networks , 1996, ICML.

[4]  Dino Pedreschi,et al.  ExAnte: Anticipated Data Reduction in Constrained Pattern Mining , 2003, PKDD.

[5]  Stanley B. Zdonik,et al.  Query Steering for Interactive Data Exploration , 2013, CIDR.

[6]  Daniel Kifer,et al.  DualMiner: A Dual-Pruning Algorithm for Itemsets with Constraints , 2002, Data Mining and Knowledge Discovery.

[7]  Uriel Feige,et al.  The Dense k -Subgraph Problem , 2001, Algorithmica.

[8]  Jade Goldstein-Stewart,et al.  The Use of MMR, Diversity-Based Reranking for Reordering Documents and Producing Summaries , 1998, SIGIR Forum.

[9]  Lei Chen,et al.  Whom to Ask? Jury Selection for Decision Making Tasks on Micro-blog Services , 2012, Proc. VLDB Endow..

[10]  Stefan Wrobel,et al.  One click mining: interactive local pattern discovery through implicit preference and performance learning , 2013, IDEA@KDD.

[11]  Jure Leskovec,et al.  Automatic Versus Human Navigation in Information Networks , 2012, ICWSM.

[12]  Jilles Vreeken,et al.  Item Sets that Compress , 2006, SDM.

[13]  Howard J. Hamilton,et al.  Interestingness measures for data mining: A survey , 2006, CSUR.

[14]  James Allan,et al.  Strategy-based interactive cluster visualization for information retrieval , 2000, International Journal on Digital Libraries.

[15]  Bart Goethals,et al.  MIME: A Framework for Interactive Visual Pattern Mining , 2011, ECML/PKDD.

[16]  Vahab S. Mirrokni,et al.  Composable core-sets for diversity and coverage maximization , 2014, PODS.

[17]  Alexandre Termier,et al.  Towards a Framework for Semantic Exploration of Frequent Patterns , 2013, IMMoA.

[18]  Emre Velipasaoglu,et al.  Intent-based diversification of web search results: metrics and algorithms , 2011, Information Retrieval.

[19]  Das Amrita,et al.  Mining Association Rules between Sets of Items in Large Databases , 2013 .

[20]  Pourang Irani,et al.  WiFIsViz: Effective Visualization of Frequent Itemsets , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[21]  Hiroki Arimura,et al.  LCM ver. 2: Efficient Mining Algorithms for Frequent/Closed/Maximal Itemsets , 2004, FIMI.

[22]  Dimitrios Gunopulos,et al.  Automatic subspace clustering of high dimensional data for data mining applications , 1998, SIGMOD '98.