Multi-Attribute Similarity Search for Interactive Data Exploration

We have developed SimSearch, a tool that simplifies data exploration by enabling top-k similarity search over large collections of entities involving multiple heterogeneous attributes from different sources. We present the supported modes for data access, and the query mechanism orchestrating multi-attribute similarity search over diverse types of attributes, including textual, numerical and spatial. Users can specify their query parameters and preferences through a web interface, and visually inspect and compare the results through appropriate visualizations for the different types of attributes involved. We demonstrate SimSearch using a real-world, commercial dataset, highlighting its capabilities for interactive, user-friendly, and intuitive data exploration.

[1]  Ihab F. Ilyas,et al.  A survey of top-k query processing techniques in relational database systems , 2008, CSUR.

[2]  Andreas Züfle,et al.  Indexing multi-metric data , 2016, 2016 IEEE 32nd International Conference on Data Engineering (ICDE).

[3]  Elena Simperl,et al.  Dataset search: a survey , 2019, The VLDB Journal.

[4]  Renée J. Miller,et al.  JOSIE: Overlap Set Similarity Search for Finding Joinable Tables in Data Lakes , 2019, SIGMOD Conference.

[5]  Dimitrios Skoutas,et al.  Similarity search over enriched geospatial data , 2020, GeoRich@SIGMOD.

[6]  Evgeny Kharlamov,et al.  Faceted search over RDF-based knowledge graphs , 2016, J. Web Semant..