Nugget discovery in visual exploration environments by query consolidation

Queries issued by casual users or specialists exploring a dataset often point us to important subsets of the data, be it clusters, outliers or other meaningful features. Capturing and caching such queries (henceforth called nuggets) has many potential benefits, including the optimization of the system performance and the search experience of users. Unfortunately, current visual exploration systems have not yet tapped into this potential resource of identifying and sharing important queries. In this paper, we introduce a query consolidation strategy aimed at solving the general problem of isolating important queries from the potentially huge amount of queries submitted. Our solution clusters redundant queries caused by exploration-style query specification, which is prevalent in data exploration systems. To measure the similarity between queries, we designed an effective distance metric that incorporates both the query specification and the actual query result. To overcome its high complexity when comparing queries with large result sets, we designed an approximation method, which is efficient while still providing excellent accuracy. A user study conducted on multivariate data sets comparing our proposed technique to others in the literature confirms that the proposed distance metric indeed matches well with users' intuition. As proof of feasibility, we integrated our proposed query consolidation solution into the Nugget Management System (NMS) framework [22], which is based on a visual exploration system XmdvTool. A second user study indicates that both the efficiency and accuracy of users' visual exploration are enhanced when supported by NMS.

[1]  Gokul Soundararajan,et al.  Using semantic information to improve transparent query caching for dynamic content Web sites , 2005, International Workshop on Data Engineering Issues in E-Commerce.

[2]  Zbigniew W. Ras The Role of Support and Confidence in Collaborative Query Answering , 2001, Intelligent Information Systems.

[3]  Alfred Inselberg,et al.  Parallel coordinates: a tool for visualizing multi-dimensional geometry , 1990, Proceedings of the First IEEE Conference on Visualization: Visualization `90.

[4]  J. Munkres ALGORITHMS FOR THE ASSIGNMENT AND TRANSIORTATION tROBLEMS* , 1957 .

[5]  Alfred Inselberg,et al.  Parallel coordinates for visualizing multi-dimensional geometry , 1987 .

[6]  Elke A. Rundensteiner,et al.  Measuring Data Abstraction Quality in Multiresolution Visualization ∗ , 2006 .

[7]  D. W. Scott On optimal and data based histograms , 1979 .

[8]  Ben Shneiderman,et al.  Tree visualization with tree-maps: 2-d space-filling approach , 1992, TOGS.

[9]  Matthew O. Ward,et al.  XmdvTool: integrating multiple methods for visualizing multivariate data , 1994, Proceedings Visualization '94.

[10]  Ji-Rong Wen,et al.  Query Clustering in the Web Context , 2003, Clustering and Information Retrieval.

[11]  Ben Shneiderman,et al.  Visual information seeking: tight coupling of dynamic query filters with starfield displays , 1994, CHI '94.

[12]  Surajit Chaudhuri,et al.  Dynamic sample selection for approximate query processing , 2003, SIGMOD '03.

[13]  Sudipto Guha,et al.  CURE: an efficient clustering algorithm for large databases , 1998, SIGMOD '98.

[14]  Andreas Buja,et al.  XGobi: Interactive Dynamic Data Visualization in the X Window System , 1998 .

[15]  Matthew O. Ward,et al.  Analysis Guided Visual Exploration of Multivariate Data , 2007, 2007 IEEE Symposium on Visual Analytics Science and Technology.

[16]  Ji-Rong Wen,et al.  Clustering user queries of a search engine , 2001, WWW '01.

[17]  Doug Beeferman,et al.  Agglomerative clustering of a search engine query log , 2000, KDD '00.

[18]  Schubert Foo,et al.  Collaborative Querying through a Hybrid Query Clustering Approach , 2003, ICADL.

[19]  H. Kuhn The Hungarian method for the assignment problem , 1955 .

[20]  Matthew O. Ward,et al.  High Dimensional Brushing for Interactive Exploration of Multivariate Data , 1995, Proceedings Visualization '95.

[21]  Matthew O. Ward,et al.  Measuring Data Abstraction Quality in Multiresolution Visualizations , 2006, IEEE Transactions on Visualization and Computer Graphics.

[22]  Serge Abiteboul,et al.  Detecting changes in XML documents , 2002, Proceedings 18th International Conference on Data Engineering.

[23]  Dongwon Lee,et al.  Semantic caching via query matching for web sources , 1999, CIKM '99.

[24]  Kai-Hsiang Yang,et al.  Approximate string matching in LDAP based on edit distance , 2002, Proceedings 16th International Parallel and Distributed Processing Symposium.

[25]  Mark Claypool,et al.  Implicit interest indicators , 2001, IUI '01.

[26]  Fazli Can,et al.  Incremental clustering for dynamic information processing , 1993, TOIS.

[27]  Kristin A. Cook,et al.  Illuminating the Path: The Research and Development Agenda for Visual Analytics , 2005 .

[28]  Tian Zhang,et al.  BIRCH: an efficient data clustering method for very large databases , 1996, SIGMOD '96.

[29]  Michelle X. Zhou,et al.  Interactive Visual Synthesis of Analytic Knowledge , 2006, 2006 IEEE Symposium On Visual Analytics Science And Technology.

[30]  Paul E. Keel Collaborative Visual Analytics: Inferring from the Spatial Organization and Collaborative Use of Information , 2006, 2006 IEEE Symposium On Visual Analytics Science And Technology.

[31]  Patrick A. V. Hall,et al.  Approximate String Matching , 1994, Encyclopedia of Algorithms.