User group analytics: hypothesis generation and exploratory analysis of user data

User data is becoming increasingly available in multiple domains ranging from the social Web to retail store receipts. User data is described by user demographics (e.g., age, gender, occupation) and user actions (e.g., rating a movie, publishing a paper, following a medical treatment). The analysis of user data is appealing to scientists who work on population studies, online marketing, recommendations, and large-scale data analytics. User data analytics usually relies on identifying group-level behavior such as “Asian women who publish regularly in databases.” Group analytics addresses peculiarities of user data such as noise and sparsity to enable insights. In this paper, we introduce a framework for user group analytics by developing several components which cover the life cycle of user groups. We provide two different analytical environments to support “hypothesis generation” and “exploratory analysis” on user groups. Experiments on datasets with different characteristics show the usability and efficiency of our group analytics framework.

[1]  Benjamín Barán,et al.  Performance metrics in multi-objective optimization , 2015, 2015 Latin American Computing Conference (CLEI).

[2]  Amer-Yahia Sihem,et al.  Testing Interestingness Measures in Practice: A Large-Scale Analysis of Buying Patterns , 2016 .

[3]  Jure Leskovec,et al.  Overlapping Communities Explain Core–Periphery Organization of Networks , 2014, Proceedings of the IEEE.

[4]  Xin Yao,et al.  R-Metric: Evaluating the Performance of Preference-Based Evolutionary Multiobjective Optimization Using Reference Points , 2018, IEEE Transactions on Evolutionary Computation.

[5]  Jure Leskovec,et al.  Empirical comparison of algorithms for network community detection , 2010, WWW '10.

[6]  Sandeep Pandey,et al.  Learning to target: what works for behavioral targeting , 2011, CIKM '11.

[7]  Serge Abiteboul Toward personal knowledge bases , 2015, DSAA.

[8]  George A. Miller,et al.  Human memory and the storage of information , 1956, IRE Trans. Inf. Theory.

[9]  Carsten Eickhoff,et al.  Cognitive Biases in Crowdsourcing , 2018, WSDM.

[10]  Sihem Amer-Yahia,et al.  Exploration of User Groups in VEXUS , 2017, 2018 IEEE 34th International Conference on Data Engineering (ICDE).

[11]  Kalyanmoy Deb,et al.  Multi-objective optimization using evolutionary algorithms , 2001, Wiley-Interscience series in systems and optimization.

[12]  Cong Yu,et al.  MRI: Meaningful Interpretations of Collaborative Ratings , 2011, Proc. VLDB Endow..

[13]  Sean M. McNee,et al.  Improving recommendation lists through topic diversification , 2005, WWW '05.

[14]  F. Maxwell Harper,et al.  The MovieLens Datasets: History and Context , 2016, TIIS.

[15]  Dimitrios Gunopulos,et al.  Automatic Subspace Clustering of High Dimensional Data , 2005, Data Mining and Knowledge Discovery.

[16]  Gary B. Lamont,et al.  Evolutionary Algorithms for Solving Multi-Objective Problems , 2002, Genetic Algorithms and Evolutionary Computation.

[17]  Jeff Gavin,et al.  Getting Acquainted with Groups and Individuals: Information Seeking, Social Uncertainty and Social Network Sites , 2013, ICWSM.

[18]  Themis Palpanas,et al.  New Trends on Exploratory Methods for Data Analytics , 2017, Proc. VLDB Endow..

[19]  Vahab S. Mirrokni,et al.  Composable core-sets for diversity and coverage maximization , 2014, PODS.

[20]  Salvatore Orlando,et al.  ConQueSt: a Constraint-based Querying System for Exploratory Pattern Discovery , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[21]  Longbing Cao Behavior Informatics to Discover Behavior Insight for Active and Tailored Client Management , 2017, KDD.

[22]  Mark Newman,et al.  Detecting community structure in networks , 2004 .

[23]  Lei Chen,et al.  Whom to Ask? Jury Selection for Decision Making Tasks on Micro-blog Services , 2012, Proc. VLDB Endow..

[24]  Sihem Amer-Yahia,et al.  Multi-Objective Group Discovery on the Social Web , 2016, ECML/PKDD.

[25]  Panos M. Pardalos,et al.  Multilevel Optimization: Algorithms and Applications , 2012 .

[26]  Aijun An,et al.  Efficient Bi-objective Team Formation in Social Networks , 2012, ECML/PKDD.

[27]  Ben Shneiderman,et al.  The eyes have it: a task by data type taxonomy for information visualizations , 1996, Proceedings 1996 IEEE Symposium on Visual Languages.

[28]  H. V. Jagadish,et al.  Guided interaction , 2011, VLDB 2011.

[29]  Jure Leskovec,et al.  Human wayfinding in information networks , 2012, WWW.

[30]  Hiroki Arimura,et al.  LCM: An Efficient Algorithm for Enumerating Frequent Closed Item Sets , 2003, FIMI.

[31]  M. Goresky,et al.  An Introduction to Abstract Algebra , 2005 .

[32]  Ramakrishnan Srikant,et al.  Mining generalized association rules , 1995, Future Gener. Comput. Syst..

[33]  Jean-Daniel Fekete,et al.  Progressive Analytics: A Computation Paradigm for Exploratory Data Analysis , 2016, ArXiv.

[34]  Jilles Vreeken,et al.  Tell me what i need to know: succinctly summarizing data with itemsets , 2011, KDD.

[35]  David S. Johnson,et al.  Approximation algorithms for combinatorial problems , 1973, STOC.

[36]  Jure Leskovec,et al.  Automatic Versus Human Navigation in Information Networks , 2012, ICWSM.

[37]  Hector Garcia-Molina,et al.  CrowdDQS: Dynamic Question Selection in Crowdsourcing Systems , 2017, SIGMOD Conference.

[38]  Thierry Bertin-Mahieux,et al.  The Million Song Dataset , 2011, ISMIR.

[39]  Fiona Fui-Hoon Nah,et al.  A study on tolerable waiting time: how long are Web users willing to wait? , 2004, AMCIS.

[40]  Sumit Ganguly,et al.  Query optimization for parallel execution , 1992, SIGMOD '92.

[41]  Uriel Feige,et al.  The Dense k -Subgraph Problem , 2001, Algorithmica.

[42]  Sihem Amer-Yahia,et al.  Group Recommendation with Temporal Affinities , 2015, EDBT.

[43]  Ben Shneiderman,et al.  Temporal Event Sequence Simplification , 2013, IEEE Transactions on Visualization and Computer Graphics.

[44]  Jilles Vreeken,et al.  Krimp: mining itemsets that compress , 2011, Data Mining and Knowledge Discovery.

[45]  Alexandre Termier,et al.  TopPI: An Efficient Algorithm for Item-Centric Mining , 2016, DaWaK.

[46]  Daniel Kahneman,et al.  Probabilistic reasoning , 1993 .

[47]  Aniket Kittur,et al.  Crowdsourcing user studies with Mechanical Turk , 2008, CHI.

[48]  Daniel Kifer,et al.  DualMiner: A Dual-Pruning Algorithm for Itemsets with Constraints , 2002, Data Mining and Knowledge Discovery.

[49]  Mihalis Yannakakis,et al.  On the approximability of trade-offs and optimal access of Web sources , 2000, Proceedings 41st Annual Symposium on Foundations of Computer Science.

[50]  Gautam Das,et al.  Facetedpedia: enabling query-dependent faceted search for wikipedia , 2010, CIKM '10.

[51]  Fei Wang,et al.  Patient Subtyping via Time-Aware LSTM Networks , 2017, KDD.

[52]  Chedy Raïssi,et al.  Mining Dominant Patterns in the Sky , 2011, 2011 IEEE 11th International Conference on Data Mining.

[53]  Dino Pedreschi,et al.  ExAnte: Anticipated Data Reduction in Constrained Pattern Mining , 2003, PKDD.

[54]  Alexandre Termier,et al.  Interactive User Group Analysis , 2015, CIKM.

[55]  Laks V. S. Lakshmanan,et al.  Exploring Rated Datasets with Rating Maps , 2017, WWW.

[56]  Howard J. Hamilton,et al.  Interestingness measures for data mining: A survey , 2006, CSUR.

[57]  Sihem Amer-Yahia,et al.  Colloquial region discovery for retail products: discovery and application , 2017, International Journal of Data Science and Analytics.

[58]  Carsten Binnig,et al.  Controlling False Discoveries During Interactive Data Exploration , 2016, SIGMOD Conference.

[59]  Lothar Thiele,et al.  Multiobjective evolutionary algorithms: a comparative case study and the strength Pareto approach , 1999, IEEE Trans. Evol. Comput..

[60]  Mohan S. Kankanhalli,et al.  Hierarchical Clustering Multi-Task Learning for Joint Human Action Grouping and Recognition , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[61]  Sunita Sarawagi,et al.  i3: intelligent, interactive investigation of OLAP data cubes , 2000, SIGMOD '00.

[62]  S. Russel and P. Norvig,et al.  “Artificial Intelligence – A Modern Approach”, Second Edition, Pearson Education, 2003. , 2015 .

[63]  Sihem Amer-Yahia,et al.  Health Monitoring on Social Media over Time , 2016, IEEE Transactions on Knowledge and Data Engineering.

[64]  David H. Laidlaw,et al.  The relation between visualization size, grouping, and user performance , 2014, IEEE Transactions on Visualization and Computer Graphics.

[65]  Zi Huang,et al.  From Community Detection to Community Profiling , 2017, Proc. VLDB Endow..

[66]  Christoph Koch,et al.  Approximation schemes for many-objective query optimization , 2014, SIGMOD Conference.

[67]  Stefan Wrobel,et al.  One click mining: interactive local pattern discovery through implicit preference and performance learning , 2013, IDEA@KDD.