VizCommender: Computing Text-Based Similarity in Visualization Repositories for Content-Based Recommendations

Cloud-based visualization services have made visual analytics accessible to a much wider audience than ever before. Systems such as Tableau have started to amass increasingly large repositories of analytical knowledge in the form of interactive visualization workbooks. When shared, these collections can form a visual analytic knowledge base. However, as the size of a collection increases, so does the difficulty in finding relevant information. Content-based recommendation (CBR) systems could help analysts in finding and managing workbooks relevant to their interests. Toward this goal, we focus on text-based content that is representative of the subject matter of visualizations rather than the visual encodings and style. We discuss the challenges associated with creating a CBR based on visualization specifications and explore more concretely how to implement the relevance measures required using Tableau workbook specifications as the source of content data. We also demonstrate what information can be extracted from these visualization specifications and how various natural language processing techniques can be used to compute similarity between workbooks as one way to measure relevance. We report on a crowd-sourced user study to determine if our similarity measure mimics human judgement. Finally, we choose latent Dirichlet allocation (LDA) as a specific model and instantiate it in a proof-of-concept recommender tool to demonstrate the basic function of our similarity measure.

[1]  Ian H. Witten,et al.  How to Build a Digital Library , 2002 .

[2]  John T. Stasko,et al.  VisIRR: Visual analytics for information retrieval and recommendation with large-scale document data , 2014, 2014 IEEE Conference on Visual Analytics Science and Technology (VAST).

[3]  J. Fleiss Measuring nominal scale agreement among many raters. , 1971 .

[4]  Jim Gemmell,et al.  Clean Living: Eliminating Near-Duplicates in Lifetime Personal Storage , 2006 .

[5]  Anthony F. Norcio,et al.  Representation, similarity measures and aggregation methods using fuzzy sets for content-based recommender systems , 2009, Fuzzy Sets Syst..

[6]  Sandra Geisler,et al.  Constance: An Intelligent Data Lake System , 2016, SIGMOD Conference.

[7]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[8]  Tomas Mikolov,et al.  Enriching Word Vectors with Subword Information , 2016, TACL.

[9]  Gordon Kindlmann,et al.  Surfacing Visualization Mirages , 2020, CHI.

[10]  Martin Wattenberg,et al.  ManyEyes: a Site for Visualization at Internet Scale , 2007, IEEE Transactions on Visualization and Computer Graphics.

[11]  Aditya G. Parameswaran,et al.  Towards Visualization Recommendation Systems , 2016, SGMD.

[12]  Guy Shani,et al.  Evaluating Recommendation Systems , 2011, Recommender Systems Handbook.

[13]  Tim Kraska,et al.  VizML: A Machine Learning Approach to Visualization Recommendation , 2018, CHI.

[14]  Kanit Wongsuphasawat,et al.  Voyager 2: Augmenting Visual Analysis with Partial View Specifications , 2017, CHI.

[15]  J. R. Landis,et al.  The measurement of observer agreement for categorical data. , 1977, Biometrics.

[16]  Melanie Tory,et al.  What Do We Talk About When We Talk About Dashboards? , 2019, IEEE Transactions on Visualization and Computer Graphics.

[17]  Yuan Yao,et al.  Judging similarity: a user-centric study of related item recommendations , 2018, RecSys.

[18]  Måns Magnusson,et al.  Pulling Out the Stops: Rethinking Stopword Removal for Topic Models , 2017, EACL.

[19]  Jindong Chen,et al.  Performance Comparison of TF*IDF, LDA and Paragraph Vector for Document Classification , 2016 .

[20]  Jin Ha Lee,et al.  Crowdsourcing Music Similarity Judgments using Mechanical Turk , 2010, ISMIR.

[21]  Mouzhi Ge,et al.  Beyond accuracy: evaluating recommender systems by coverage and serendipity , 2010, RecSys '10.

[22]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[23]  Ansgar Scherp,et al.  What to Read Next? Challenges and Preliminary Results in Selecting Representative Documents , 2018, DEXA Workshops.

[24]  Jeffrey Heer,et al.  Crowdsourcing graphical perception: using mechanical turk to assess visualization design , 2010, CHI.

[25]  Bart P. Knijnenburg,et al.  Explaining the user experience of recommender systems , 2012, User Modeling and User-Adapted Interaction.

[26]  Jin Zhang,et al.  Image similarity as assessed by users: A quantitative study , 2012, ASIST.

[27]  Jianhua Lin,et al.  Divergence measures based on the Shannon entropy , 1991, IEEE Trans. Inf. Theory.

[28]  Angela J. Yu,et al.  Extracting Human Face Similarity Judgments: Pairs or Triplets? , 2016, CogSci.

[29]  Jacob Cohen A Coefficient of Agreement for Nominal Scales , 1960 .

[30]  Jilin Zhang,et al.  How YouTube videos are discovered and its impact on video views , 2015, Multimedia Tools and Applications.

[31]  Mária Bieliková,et al.  Content-Based News Recommendation , 2010, EC-Web.

[32]  Francis R. Bach,et al.  Online Learning for Latent Dirichlet Allocation , 2010, NIPS.

[33]  Pasquale Lops,et al.  Content-based Recommender Systems: State of the Art and Trends , 2011, Recommender Systems Handbook.

[34]  Richard A. Harshman,et al.  Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..

[35]  Sebastian Koch,et al.  Visual Analysis and Dissemination of Scientific Literature Collections with SurVis , 2016, IEEE Transactions on Visualization and Computer Graphics.

[36]  Kevin Li,et al.  Faceted metadata for image search and browsing , 2003, CHI '03.

[37]  Allison Druin,et al.  Technology probes: inspiring design for and with families , 2003, CHI '03.

[38]  Andrei Z. Broder,et al.  On the resemblance and containment of documents , 1997, Proceedings. Compression and Complexity of SEQUENCES 1997 (Cat. No.97TB100171).

[39]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[40]  Quoc V. Le,et al.  Distributed Representations of Sentences and Documents , 2014, ICML.

[41]  Markus Schedl,et al.  Feature-combination hybrid recommender systems for automated music playlist continuation , 2019, User Modeling and User-Adapted Interaction.

[42]  Sean M. McNee,et al.  Being accurate is not enough: how accuracy metrics have hurt recommender systems , 2006, CHI Extended Abstracts.

[43]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[44]  Matthew Graham,et al.  Users in the loop: a psychologically-informed approach to similar item retrieval , 2019, RecSys.

[45]  Jia Zhang,et al.  Evaluating Item-Item Similarity Algorithms for Movies , 2016, CHI Extended Abstracts.

[46]  Jeffrey Heer,et al.  Beyond Heuristics: Learning Visualization Design , 2018, ArXiv.

[47]  Carlos Eduardo Scheidegger,et al.  An Algebraic Process for Visualization Design , 2014, IEEE Transactions on Visualization and Computer Graphics.

[48]  Pradeep Ravikumar,et al.  Word Mover’s Embedding: From Word2Vec to Document Embedding , 2018, EMNLP.

[49]  Christoph Trattner,et al.  Content-based artwork recommendation: integrating painting metadata with neural and manually-engineered visual features , 2018, User Modeling and User-Adapted Interaction.

[50]  Mark Levy,et al.  Offline evaluation of recommender systems: all pain and no gain? , 2013, RepSys '13.

[51]  Petr Sojka,et al.  Software Framework for Topic Modelling with Large Corpora , 2010 .

[52]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[53]  Arvind Satyanarayan,et al.  Vega-Lite: A Grammar of Interactive Graphics , 2018, IEEE Transactions on Visualization and Computer Graphics.

[54]  Zuhair Bandar,et al.  A Comparative Study of Two Short Text Semantic Similarity Measures , 2008, KES-AMSTA.

[55]  Martha Larson,et al.  Discovering User Perceptions of Semantic Similarity in Near-duplicate Multimedia Files , 2012, CrowdSearch.

[56]  Jason Dykes,et al.  Glyphs for Exploring Crowd‐sourced Subjective Survey Classification , 2014, Comput. Graph. Forum.

[57]  Katrien Verbert,et al.  Interactive recommender systems: A survey of the state of the art and future research challenges and opportunities , 2016, Expert Syst. Appl..

[58]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[59]  Ben Shneiderman,et al.  The eyes have it: a task by data type taxonomy for information visualizations , 1996, Proceedings 1996 IEEE Symposium on Visual Languages.

[60]  Maurizio Morisio,et al.  Hybrid recommender systems: A systematic literature review , 2019, Intell. Data Anal..

[61]  Hakan Ferhatosmanoglu,et al.  Short text classification in twitter to improve information filtering , 2010, SIGIR.

[62]  Dietmar Jannach,et al.  Sequence-Aware Recommender Systems , 2018, UMAP.

[63]  John Riedl,et al.  Recommender systems: from algorithms to user experience , 2012, User Modeling and User-Adapted Interaction.

[64]  Michael S. Bernstein,et al.  Learning Perceptual Kernels for Visualization Design , 2014, IEEE Transactions on Visualization and Computer Graphics.

[65]  Rui Liu,et al.  Draining the Data Swamp: A Similarity-based Approach , 2018, HILDA@SIGMOD.

[66]  Çagatay Demiralp,et al.  Data2Vis: Automatic Generation of Data Visualizations Using Sequence-to-Sequence Recurrent Neural Networks , 2018, IEEE Computer Graphics and Applications.

[67]  Ryen W. White,et al.  Exploratory Search: Beyond the Query-Response Paradigm , 2009, Exploratory Search: Beyond the Query-Response Paradigm.

[68]  Sana Malik,et al.  Personalizable and Interactive Sequence Recommender System , 2018, CHI Extended Abstracts.

[69]  Wiebke Wagner,et al.  Steven Bird, Ewan Klein and Edward Loper: Natural Language Processing with Python, Analyzing Text with the Natural Language Toolkit , 2010, Lang. Resour. Evaluation.

[70]  Jeffrey Heer,et al.  Formalizing Visualization Design Knowledge as Constraints: Actionable and Extensible Models in Draco , 2018, IEEE Transactions on Visualization and Computer Graphics.

[71]  Franca Garzotto,et al.  Content-Based Video Recommendation System Based on Stylistic Visual Features , 2016, Journal on Data Semantics.

[72]  Ajay Agarwal,et al.  Similarity Measures used in Recommender Systems : A Study , 2017 .

[73]  Eneko Agirre,et al.  SemEval-2016 Task 1: Semantic Textual Similarity, Monolingual and Cross-Lingual Evaluation , 2016, *SEMEVAL.

[74]  Pat Hanrahan,et al.  Show Me: Automatic Presentation for Visual Analysis , 2007, IEEE Transactions on Visualization and Computer Graphics.

[75]  Stephen E. Robertson,et al.  Understanding inverse document frequency: on theoretical arguments for IDF , 2004, J. Documentation.

[76]  Karin Mauge,et al.  Structuring E-Commerce Inventory , 2012, ACL.

[77]  Michael Isard,et al.  General Theory , 1969 .

[78]  Kenney Ng,et al.  Auto-grouping emails for faster e-discovery , 2011, Proc. VLDB Endow..