Towards Plug-and-Play Visual Graph Query Interfaces: Data-driven Canned Pattern Selection for Large Networks

Canned patterns ( i.e. , small subgraph patterns) in visual graph query interfaces (a.k.a GUI) facilitate efficient query formulation by enabling pattern-at-a-time construction mode. However, existing GUIS for querying large networks either do not expose any canned patterns or if they do then they are typically selected manually based on domain knowledge. Unfortunately, manual generation of canned patterns is not only labor intensive but may also lack diversity for supporting efficient visual formulation of a wide range of subgraph queries. In this paper, we present a novel, generic, and extensible framework called TATTOO that takes a data-driven approach to automatically select canned patterns for a GUI from large networks. Specifically, it first decomposes the underlying network into truss-infested and truss-oblivious regions. Then candidate canned patterns capturing different real-world query topologies are generated from these regions. Canned patterns based on a user-specified plug are then selected for the GUI from these candidates by maximizing coverage and diversity , and by minimizing the cognitive load of the pattern set. Experimental studies with real-world datasets demonstrate the benefits of TATTOO. Importantly, this work takes a concrete step towards realizing plug-and-play visual graph query interfaces for large networks.

[1]  Jeffrey D. Ullman,et al.  Enumerating subgraph instances using map-reduce , 2012, 2013 IEEE 29th International Conference on Data Engineering (ICDE).

[2]  Jia Wang,et al.  Truss Decomposition in Massive Networks , 2012, Proc. VLDB Endow..

[3]  R. Forthofer,et al.  Rank Correlation Methods , 1981 .

[4]  Alex Endert,et al.  Visual Graph Query Construction and Refinement , 2017, SIGMOD Conference.

[5]  Keval Vora,et al.  Peregrine: a pattern-aware graph mining system , 2020, EuroSys.

[6]  S. Hart,et al.  Development of NASA-TLX (Task Load Index): Results of Empirical and Theoretical Research , 1988 .

[7]  Ben Shneiderman,et al.  Designing the User Interface: Strategies for Effective Human-Computer Interaction , 1998 .

[8]  Sourav S. Bhowmick,et al.  DaVinci: Data-driven visual interface construction for subgraph search in graph databases , 2015, 2015 IEEE 31st International Conference on Data Engineering.

[9]  Axel-Cyrille Ngonga Ngomo,et al.  LargeRDFBench: A billion triples benchmark for SPARQL endpoint federation , 2018, J. Web Semant..

[10]  Curtis E. Dyreson,et al.  Data-driven Visual Graph Query Interface Construction and Maintenance: Challenges and Opportunities , 2016, Proc. VLDB Endow..

[11]  C S LuiJohn,et al.  A general framework for estimating graphlet statistics via random walk , 2016, VLDB 2016.

[12]  Igor Jurisica,et al.  Modeling interactome: scale-free or geometric? , 2004, Bioinform..

[13]  Kai Huang,et al.  AURORA: Data-driven Construction of Visual Graph Query Interfaces for Graph Databases , 2020, SIGMOD Conference.

[14]  Kai Huang,et al.  CATAPULT: Data-driven Selection of Canned Patterns for Efficient Visual Graph Query Formulation , 2019, SIGMOD Conference.

[15]  Balaraman Ravindran,et al.  COMMIT: A Scalable Approach to Mining Communication Motifs from Dynamic Networks , 2015, SIGMOD Conference.

[16]  Boris Zeide,et al.  Analysis of Growth Equations , 1993 .

[17]  Ryan A. Rossi,et al.  Efficient Graphlet Counting for Large Networks , 2015, 2015 IEEE International Conference on Data Mining.

[18]  Sourav S. Bhowmick,et al.  MIDAS: Towards Efficient and Effective Maintenance of Canned Patterns in Visual Graph Query Interfaces , 2021, SIGMOD Conference.

[19]  Shixuan Sun,et al.  In-Memory Subgraph Matching: An In-depth Study , 2020, SIGMOD Conference.

[20]  Stephan Diehl,et al.  Exploring the Limits of Complexity: A Survey of Empirical Studies on Graph Visualisation , 2018, Vis. Informatics.

[21]  Vivek S. Borkar,et al.  Submodularity in Team Formation Problem , 2014, SDM.

[22]  Li Lin,et al.  Modeling team member characteristics for the formation of a multifunctional team in concurrent engineering , 2004, IEEE Transactions on Engineering Management.

[23]  Richard M. Karp,et al.  Reducibility Among Combinatorial Problems , 1972, 50 Years of Integer Programming.

[24]  Edward M. Reingold,et al.  Graph drawing by force‐directed placement , 1991, Softw. Pract. Exp..

[25]  Joseph Naor,et al.  Submodular Maximization with Cardinality Constraints , 2014, SODA.

[26]  Wim Martens,et al.  An Analytical Study of Large SPARQL Query Logs , 2017, Proc. VLDB Endow..

[27]  Danai Koutra,et al.  Network similarity via multiple social theories , 2013, 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2013).

[28]  S. Shen-Orr,et al.  Network motifs: simple building blocks of complex networks. , 2002, Science.

[29]  Janez Demsar,et al.  A combinatorial approach to graphlet counting , 2014, Bioinform..

[30]  Aarzoo Dhiman,et al.  Frequent subgraph mining algorithms for single large graphs — A brief survey , 2016, 2016 International Conference on Advances in Computing, Communication, & Automation (ICACCA) (Spring).

[31]  Alex Endert,et al.  VIGOR: Interactive Visual Exploration of Graph Query Results , 2018, IEEE Transactions on Visualization and Computer Graphics.

[32]  Keshav Pingali,et al.  Parallel triangle counting and k-truss identification using graph-centric methods , 2017, 2017 IEEE High Performance Extreme Computing Conference (HPEC).

[33]  Guanrong Chen,et al.  Complex networks: small-world, scale-free and beyond , 2003 .

[34]  Amine Mhedhbi,et al.  The Ubiquity of Large Graphs and Surprising Challenges of Graph Processing , 2017 .

[35]  Elena Paslaru Bontas Simperl,et al.  Deriving human-readable labels from SPARQL queries , 2011, I-Semantics '11.

[36]  Weidong Huang,et al.  Measuring Effectiveness of Graph Visualizations: A Cognitive Load Perspective , 2009, Inf. Vis..

[37]  M. Jacomy,et al.  ForceAtlas2, a Continuous Graph Layout Algorithm for Handy Network Visualization Designed for the Gephi Software , 2014, PloS one.

[38]  M. L. Fisher,et al.  An analysis of approximations for maximizing submodular set functions—I , 1978, Math. Program..

[39]  Mario Vento,et al.  A (sub)graph isomorphism algorithm for matching large graphs , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[40]  Weidong Huang,et al.  Exploring the relative importance of crossing number and crossing angle , 2010, VINCI '10.