Rapid Sequence Matching for Visualization Recommender Systems

We present a method to support high quality visualization recommendations for analytic tasks. Visualization converts large datasets into images that allow viewers to efficiently explore, discover, and validate within their data. Visualization recommenders have been proposed that store past sequences: an ordered collection of design choices leading to successful task completion; then match them against an ongoing visualization construction. Based on this matching, a system recommends visualizations that better support the analysts’ tasks. A problem of scalability occurs when many sequences are stored. One solution would be to index the sequence database. However, during matching we require sequences that are similar to the partially constructed visualization, not only those that are identical. We implement a locality sensitive hashing algorithm that converts visualizations into set representations, then uses Jaccard similarity to store similar sequence nodes in common hash buckets. This allows us to match partial sequences against a database containing tens of thousands of full sequences in less than 100ms. Experiments show that our algorithm locates 95% or more of the sequences found in an exhaustive search, producing high-quality visualization recommendations.

[1]  James T. Enns,et al.  Attention and Visual Memory in Visualization and Computer Graphics , 2012, IEEE Transactions on Visualization and Computer Graphics.

[2]  V. Leitáo,et al.  Computer Graphics: Principles and Practice , 1995 .

[3]  Ronald Fagin,et al.  Extendible hashing—a fast access method for dynamic files , 1979, ACM Trans. Database Syst..

[4]  Arvind Satyanarayan,et al.  Declarative interaction design for data visualization , 2014, UIST.

[5]  Jacques Bertin,et al.  Semiologie graphique : les diagrammes les réseaux, les cartes , 1969 .

[6]  Piotr Indyk,et al.  Approximate nearest neighbors: towards removing the curse of dimensionality , 1998, STOC '98.

[7]  Qi Tian,et al.  Min-Max Hash for Jaccard Similarity , 2013, 2013 IEEE 13th International Conference on Data Mining.

[8]  Jeffrey Heer,et al.  D³ Data-Driven Documents , 2011, IEEE Transactions on Visualization and Computer Graphics.

[9]  Mark Bailey,et al.  The Grammar of Graphics , 2007, Technometrics.

[10]  Padma Reddy,et al.  Visualization in Scientific Computing. , 1996 .

[11]  Jure Leskovec,et al.  Mining of Massive Datasets: MapReduce and the New Software Stack , 2014 .

[12]  Christopher G. Healey Disk-Based Algorithms for Big Data , 2016 .

[13]  Jacob Cohen,et al.  A power primer. , 1992, Psychological bulletin.

[14]  Mayank Bawa,et al.  LSH forest: self-tuning indexes for similarity search , 2005, WWW '05.

[15]  Zhe Wang,et al.  Multi-Probe LSH: Efficient Indexing for High-Dimensional Similarity Search , 2007, VLDB.

[16]  Philip K. Chan,et al.  Learning implicit user interest hierarchy for context in personalization , 2008, IUI '03.

[17]  Christopher G. Healey,et al.  Interest Driven Navigation in Visualization , 2012, IEEE Transactions on Visualization and Computer Graphics.

[18]  A. Raftery Bayesian Model Selection in Social Research , 1995 .

[19]  T. M. Murali,et al.  CrowdLayout: Crowdsourced Design and Evaluation of Biological Network Visualizations , 2018, CHI.

[20]  Christopher G. Healey,et al.  Flexible web visualization for alert-based network security analytics , 2013, VizSec '13.

[21]  Aditya G. Parameswaran,et al.  Towards Visualization Recommendation Systems , 2016, SGMD.

[22]  Christopher G. Healey,et al.  Visual Perception and Mixed-Initiative Interaction for Assisted Visualization Design , 2008, IEEE Transactions on Visualization and Computer Graphics.

[23]  Steve Kelling,et al.  BirdVis: Visualizing and Understanding Bird Populations , 2011, IEEE Transactions on Visualization and Computer Graphics.

[24]  Juliana Freire,et al.  VisComplete: Automating Suggestions for Visualization Pipelines , 2008, IEEE Transactions on Visualization and Computer Graphics.

[25]  Geoffrey Zweig,et al.  Syntactic Clustering of the Web , 1997, Comput. Networks.

[26]  P. Leblond,et al.  Computer Simulations of the Influence of Ocean Currents on Fraser River Sockeye Salmon (Oncorhynchus nerka) Return Times , 1994 .

[27]  Zhen Wen,et al.  Behavior-driven visualization recommendation , 2009, IUI.

[28]  Jock D. Mackinlay,et al.  Automating the design of graphical presentations of relational information , 1986, TOGS.

[29]  Jim Webber,et al.  A programmatic introduction to Neo4j , 2018, SPLASH '12.

[30]  René Peinl,et al.  Performance of graph query languages: comparison of cypher, gremlin and native access in Neo4j , 2013, EDBT '13.

[31]  Heng Tao Shen,et al.  Hashing for Similarity Search: A Survey , 2014, ArXiv.

[32]  Christopher G. Healey,et al.  Effective Visualization of Temporal Ensembles , 2016, IEEE Transactions on Visualization and Computer Graphics.

[33]  Christopher G. Healey,et al.  Visualizing combinatorial auctions , 2011, The Visual Computer.

[34]  Andrei Z. Broder,et al.  On the resemblance and containment of documents , 1997, Proceedings. Compression and Complexity of SEQUENCES 1997 (Cat. No.97TB100171).

[35]  Nicu Sebe,et al.  A Survey on Learning to Hash , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[36]  Mark Claypool,et al.  Implicit interest indicators , 2001, IUI '01.

[37]  Jaime Teevan,et al.  Implicit feedback for inferring user preference: a bibliography , 2003, SIGF.

[38]  Kanit Wongsuphasawat,et al.  Voyager 2: Augmenting Visual Analysis with Partial View Specifications , 2017, CHI.

[39]  Christopher G. Healey,et al.  Assisted navigation for large information spaces , 2002, IEEE Visualization, 2002. VIS 2002..