Data Pipelines for User Group Analytics

User data is becoming increasingly available in various domains ranging from the social Web to electronic patient health records (EHRs). User data is characterized by a combination of demographics (e.g., age, gender, life status) and user actions (e.g., posting a tweet, following a diet). Domain experts rely on user data to conduct large-scale population studies. Information consumers, on the other hand, rely on user data for routine tasks such as finding a book club and getting advice from look-alike patients. User data analytics is usually based on identifying group-level behaviors such as "teenage females who watch Titanic" and "old male patients in Paris who suffer from Bronchitis." In this tutorial, we review data pipelines for User Group Analytics (UGA). These pipelines admit raw user data as input and return insights in the form of user groups. We review research on UGA pipelines and discuss approaches and open challenges for discovering, exploring, and visualizing user groups. Throughout the tutorial, we will illustrate examples in two key domains: "the social Web" and "health-care".

[1]  GunopulosDimitrios,et al.  Automatic subspace clustering of high dimensional data for data mining applications , 1998 .

[2]  Alexandre Termier,et al.  Interactive User Group Analysis , 2015, CIKM.

[3]  Laks V. S. Lakshmanan,et al.  Exploring Rated Datasets with Rating Maps , 2017, WWW.

[4]  Carsten Binnig,et al.  Controlling False Discoveries During Interactive Data Exploration , 2016, SIGMOD Conference.

[5]  Jiawei Han,et al.  Discovering interesting patterns through user's interactive feedback , 2006, KDD '06.

[6]  Behrooz Omidvar-Tehrani,et al.  Optimization-based User Group Management : Discovery, Analysis, Recommendation , 2015 .

[7]  Oded Nov,et al.  The Persuasive Power of Data Visualization , 2014, IEEE Transactions on Visualization and Computer Graphics.

[8]  Kai Lawonn,et al.  3D Regression Heat Map Analysis of Population Study Data , 2016, IEEE Transactions on Visualization and Computer Graphics.

[9]  Sihem Amer-Yahia,et al.  Human Factors in Crowdsourcing , 2016, Proc. VLDB Endow..

[10]  Ben Shneiderman,et al.  Interactive Dynamics for Visual Analysis , 2012 .

[11]  Laks V. S. Lakshmanan,et al.  Incremental cluster evolution tracking from highly dynamic network data , 2014, 2014 IEEE 30th International Conference on Data Engineering.

[12]  Arnab Nandi,et al.  Distributed and interactive cube exploration , 2014, 2014 IEEE 30th International Conference on Data Engineering.

[13]  Kwan-Liu Ma,et al.  Visual cluster exploration of web clickstream data , 2012, 2012 IEEE Conference on Visual Analytics Science and Technology (VAST).

[14]  David H. Laidlaw,et al.  The relation between visualization size, grouping, and user performance , 2014, IEEE Transactions on Visualization and Computer Graphics.

[15]  Kevin Zeng Hu,et al.  DIVE: A Mixed-Initiative System Supporting Integrated Data Exploration Workflows , 2018, HILDA@SIGMOD.

[16]  Alexandre Termier,et al.  Towards a Framework for Semantic Exploration of Frequent Patterns , 2013, IMMoA.

[17]  Behrooz Omidvar-Tehrani,et al.  Augmented Therapy with Online Support Groups , 2018, Poly/DMAH@VLDB.

[18]  Anthony K. H. Tung,et al.  Cohort Query Processing , 2016, Proc. VLDB Endow..

[19]  Kai Huang,et al.  PICASSO: Exploratory Search of Connected Subgraph Substructures in Graph Databases , 2017, Proc. VLDB Endow..

[20]  Sihem Amer-Yahia,et al.  Group Recommendation with Temporal Affinities , 2015, EDBT.

[21]  John Lee,et al.  Effortless Data Exploration with zenvisage: An Expressive and Interactive Visual Analytics System , 2016, Proc. VLDB Endow..

[22]  Dan Suciu,et al.  A formal approach to finding explanations for database queries , 2014, SIGMOD Conference.

[23]  Snehasis Mukhopadhyay,et al.  Interactive pattern mining on hidden data: a sampling-based solution , 2012, CIKM.

[24]  Laks V. S. Lakshmanan,et al.  Cohort Representation and Exploration , 2018, 2018 IEEE 5th International Conference on Data Science and Advanced Analytics (DSAA).

[25]  Melanie Tory,et al.  Visualizing Dimension Coverage to Support Exploratory Analysis , 2017, IEEE Transactions on Visualization and Computer Graphics.

[26]  Anshul Vikram Pandey,et al.  TextTile: An Interactive Visualization Tool for Seamless Exploratory Analysis of Structured Data and Unstructured Text , 2017, IEEE Transactions on Visualization and Computer Graphics.

[27]  C. Bron,et al.  Algorithm 457: finding all cliques of an undirected graph , 1973 .

[28]  Sihem Amer-Yahia,et al.  Multi-Objective Group Discovery on the Social Web , 2016, ECML/PKDD.

[29]  Stefan Wrobel,et al.  One click mining: interactive local pattern discovery through implicit preference and performance learning , 2013, IDEA@KDD.

[30]  Yuanzhe Chen,et al.  Sequence Synopsis: Optimize Visual Summary of Temporal Event Data , 2018, IEEE Transactions on Visualization and Computer Graphics.

[31]  Eugene Wu sirrice The Case for Data Visualization Management Systems [ Vision Paper ] , 2014 .

[32]  Zhen Li,et al.  CloudVista: Interactive and Economical Visual Cluster Analysis for Big Data in the Cloud , 2012, Proc. VLDB Endow..

[33]  Benjamin Recht,et al.  KeystoneML: Optimizing Pipelines for Large-Scale Advanced Analytics , 2016, 2017 IEEE 33rd International Conference on Data Engineering (ICDE).

[34]  Jure Leskovec,et al.  Empirical comparison of algorithms for network community detection , 2010, WWW '10.

[35]  Giuseppe Carenini,et al.  ConVis: A Visual Text Analytic System for Exploring Blog Conversations , 2014, Comput. Graph. Forum.

[36]  Arvind Satyanarayan,et al.  Vega-Lite: A Grammar of Interactive Graphics , 2018, IEEE Transactions on Visualization and Computer Graphics.

[37]  Jian Zhao,et al.  Interactive Exploration of Implicit and Explicit Relations in Faceted Datasets , 2013, IEEE Transactions on Visualization and Computer Graphics.

[38]  Gang Wang,et al.  Unsupervised Clickstream Clustering for User Behavior Analysis , 2016, CHI.

[39]  Surajit Chaudhuri,et al.  Overview of Data Exploration Techniques , 2015, SIGMOD Conference.

[40]  Martin Wattenberg,et al.  Parallel Tag Clouds to explore and analyze faceted text corpora , 2009, 2009 IEEE Symposium on Visual Analytics Science and Technology.

[41]  Arnab Nandi,et al.  Evaluating Interactive Data Systems: Workloads, Metrics, and Guidelines , 2018, SIGMOD Conference.

[42]  Divesh Srivastava,et al.  Exploring Change - A New Dimension of Data Analytics , 2018, Proc. VLDB Endow..

[43]  Josua Krause,et al.  Supporting Iterative Cohort Construction with Visual Temporal Queries , 2016, IEEE Transactions on Visualization and Computer Graphics.

[44]  Salvatore Orlando,et al.  ConQueSt: a Constraint-based Querying System for Exploratory Pattern Discovery , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[45]  Longbing Cao Behavior Informatics to Discover Behavior Insight for Active and Tailored Client Management , 2017, KDD.

[46]  Lidan Shou,et al.  FlashView: An Interactive Visual Explorer for Raw Data , 2017, Proc. VLDB Endow..

[47]  H. V. Jagadish,et al.  Guided Interaction: Rethinking the Query-Result Paradigm , 2011, Proc. VLDB Endow..

[48]  Gautam Das,et al.  Facetedpedia: enabling query-dependent faceted search for wikipedia , 2010, CIKM '10.

[49]  Kai Huang,et al.  C-Explorer: Browsing Communities in Large Graphs , 2017, Proc. VLDB Endow..

[50]  Themis Palpanas,et al.  New Trends on Exploratory Methods for Data Analytics , 2017, Proc. VLDB Endow..

[51]  Cong Yu,et al.  Space efficiency in group recommendation , 2010, The VLDB Journal.

[52]  Mark Newman,et al.  Detecting community structure in networks , 2004 .

[53]  Joseph M. Hellerstein,et al.  Data Tweening: Incremental Visualization of Data Transforms , 2017, Proc. VLDB Endow..

[54]  Peter J. Haas,et al.  Foresight: Recommending Visual Insights , 2017, Proc. VLDB Endow..

[55]  Sihem Amer-Yahia,et al.  Online Lattice-Based Abstraction of User Groups , 2017, DEXA.

[56]  Olga Papaemmanouil,et al.  AIDE: An Active Learning-Based Approach for Interactive Data Exploration , 2016, IEEE Transactions on Knowledge and Data Engineering.

[57]  Sihem Amer-Yahia,et al.  Worker Skill Estimation in Team-Based Tasks , 2015, Proc. VLDB Endow..

[58]  Amine Mhedhbi,et al.  The Ubiquity of Large Graphs and Surprising Challenges of Graph Processing , 2017 .

[59]  Sihem Amer-Yahia,et al.  Interactive Exploration of Composite Items , 2018, EDBT.

[60]  George A. Miller,et al.  Human memory and the storage of information , 1956, IRE Trans. Inf. Theory.

[61]  Sihem Amer-Yahia,et al.  Exploration of User Groups in VEXUS , 2017, 2018 IEEE 34th International Conference on Data Engineering (ICDE).