Interactive User Group Analysis Technical Report

User data is becoming increasingly available in multiple domains ranging from phone usage traces to data on the social Web. The analysis of user data is appealing to scientists who work on population studies, recommendations, and large-scale data analytics. We argue for the need for an interactive analysis to understand the multiple facets of user data and address different analytics scenarios. Since user data is often sparse and noisy, we propose to produce labeled groups that describe users with common properties and develop IUGA, an interactive framework based on group discovery primitives to explore the user space. At each step of IUGA, an analyst visualizes group members and may take an action on the group (add/remove members) and choose an operation (exploit/explore) to discover more groups and hence more users. Each discovery operation results in k most relevant and diverse groups. We formulate group exploitation and exploration as optimization problems and devise greedy algorithms to enable efficient group discovery. Finally, we design a principled validation methodology and run extensive experiments that validate the effectiveness of IUGA on large datasets for different user space analysis scenarios.

[1]  Vahab S. Mirrokni,et al.  Composable core-sets for diversity and coverage maximization , 2014, PODS.

[2]  Uriel Feige,et al.  The Dense k -Subgraph Problem , 2001, Algorithmica.

[3]  James Allan,et al.  Strategy-based interactive cluster visualization for information retrieval , 2000, International Journal on Digital Libraries.

[4]  Jilles Vreeken,et al.  Item Sets that Compress , 2006, SDM.

[5]  Stanley B. Zdonik,et al.  Query Steering for Interactive Data Exploration , 2013, CIDR.

[6]  Hiroki Arimura,et al.  LCM ver. 2: Efficient Mining Algorithms for Frequent/Closed/Maximal Itemsets , 2004, FIMI.

[7]  Alexandre Termier,et al.  Towards a Framework for Semantic Exploration of Frequent Patterns , 2013, IMMoA.

[8]  GunopulosDimitrios,et al.  Automatic subspace clustering of high dimensional data for data mining applications , 1998 .

[9]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[10]  Snehasis Mukhopadhyay,et al.  Interactive pattern mining on hidden data: a sampling-based solution , 2012, CIKM.

[11]  Daniel Kifer,et al.  DualMiner: A Dual-Pruning Algorithm for Itemsets with Constraints , 2002, Data Mining and Knowledge Discovery.

[12]  Howard J. Hamilton,et al.  Interestingness measures for data mining: A survey , 2006, CSUR.

[13]  Naren Ramakrishnan,et al.  Redescription Mining: Structure Theory and Algorithms , 2005, AAAI.

[14]  Jure Leskovec,et al.  Automatic Versus Human Navigation in Information Networks , 2012, ICWSM.

[15]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[16]  Pourang Irani,et al.  WiFIsViz: Effective Visualization of Frequent Itemsets , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[17]  Jade Goldstein-Stewart,et al.  The use of MMR, diversity-based reranking for reordering documents and producing summaries , 1998, SIGIR '98.

[18]  Dino Pedreschi,et al.  ExAnte: Anticipated Data Reduction in Constrained Pattern Mining , 2003, PKDD.

[19]  Lei Chen,et al.  Whom to Ask? Jury Selection for Decision Making Tasks on Micro-blog Services , 2012, Proc. VLDB Endow..

[20]  Bart Goethals,et al.  MIME: a framework for interactive visual pattern mining , 2011, KDD.

[21]  Emre Velipasaoglu,et al.  Intent-based diversification of web search results: metrics and algorithms , 2011, Information Retrieval.

[22]  Nir Friedman,et al.  Discretizing Continuous Attributes While Learning Bayesian Networks , 1996, ICML.

[23]  David S. Johnson,et al.  Approximation algorithms for combinatorial problems , 1973, STOC.

[24]  Stefan Wrobel,et al.  One click mining: interactive local pattern discovery through implicit preference and performance learning , 2013, IDEA@KDD.