Interactive Navigation of Open Data Linkages

We developed Toronto Open Data Search to support the ad hoc, interactive discovery of connections or linkages between datasets. It can be used to efficiently navigate through the open data cloud. Our system consists of three parts: a user-interface provided by a Web application; a scalable backend infrastructure that supports navigational queries; and a dynamic repository of open data tables. Our system uses LSH Ensemble, an efficient index structure, to compute linkages (attributes in two datasets with high containment score) in real time at Internet scale. Our application allows users to navigate along these linkages by joining datasets. LSH Ensemble is scalable, providing millisecond response times for linkage discovery queries even over millions of datasets. Our system offers users a highly interactive experience making unrelated (and unlinked) dynamic collections of datasets appear as a richly connected cloud of data that can be navigated and combined easily in real time.

[1]  Din J. Wasem,et al.  Mining of Massive Datasets , 2014 .

[2]  Renée J. Miller,et al.  Discovering Linkage Points over Web Data , 2013, Proc. VLDB Endow..

[3]  Dominique Ritze,et al.  A Large Public Corpus of Web Tables containing Time and Context Metadata , 2016, WWW.

[4]  Renée J. Miller,et al.  LSH Ensemble: Internet-Scale Domain Search , 2016, Proc. VLDB Endow..

[5]  Piotr Indyk,et al.  Approximate nearest neighbors: towards removing the curse of dimensionality , 1998, STOC '98.

[6]  Heiko Paulheim,et al.  The Mannheim Search Join Engine , 2015, J. Web Semant..

[7]  Andrei Z. Broder,et al.  On the resemblance and containment of documents , 1997, Proceedings. Compression and Complexity of SEQUENCES 1997 (Cat. No.97TB100171).

[8]  Reynold Xin,et al.  Finding related tables , 2012, SIGMOD Conference.