Querying Heterogeneous Document Stores

NoSQL document stores offer support to store documents described using various structures. Hence, the user has to formulate queries using the possible representations of the desired information from different schemas. In this paper, we propose a novel approach that enables querying operators over a collection of documents with structural heterogeneity. Our work introduces an automatic query rewriting mechanism based on combinations of elementary operators: project, restrict and aggregate. We generate a custom dictionary that tracks all representations for attributes used in the documents. Finally, we discuss the results of our approach with a series of experiments.

[1]  Steven J. DeRose,et al.  XML Path Language (XPath) Version 1.0 , 1999 .

[2]  Jesús García Molina,et al.  Inferring Versioned Schemas from NoSQL Databases and Its Applications , 2015, ER.

[3]  Jianguo Wang,et al.  Towards heterogeneous keyword search , 2017, ACM TUR-C '17.

[4]  Chen Wang,et al.  Schema Management for Document Stores , 2015, Proc. VLDB Endow..

[5]  Jérôme Euzenat,et al.  A Survey of Schema-Based Matching Approaches , 2005, J. Data Semant..

[6]  Pierre Bourhis,et al.  JSON: Data model, Query languages and Schema specification , 2017, PODS.

[7]  Yannis Papakonstantinou,et al.  Query rewriting for semistructured data , 1999, SIGMOD '99.

[8]  Daniel J. Abadi,et al.  Automatic Generation of Normalized Relational Schemas from Nested Key-Value Data , 2016, SIGMOD Conference.

[9]  Sandra Geisler,et al.  Constance: An Intelligent Data Lake System , 2016, SIGMOD Conference.

[10]  Scott Boag,et al.  XQuery 1.0 : An XML Query Language , 2007 .

[11]  D. Florescu,et al.  JSONiq: The History of a Query Language , 2013, IEEE Internet Computing.

[12]  Dario Colazzo,et al.  Schema Inference for Massive JSON Datasets , 2017, EDBT.

[13]  Yizhou Sun,et al.  Entity Matching across Heterogeneous Sources , 2015, KDD.

[14]  Erhard Rahm,et al.  A survey of approaches to automatic schema matching , 2001, The VLDB Journal.

[15]  Jignesh M. Patel,et al.  Enabling JSON Document Stores in Relational Systems , 2013, WebDB.

[16]  Alberto Abelló,et al.  NOSQL Design for Analytical Workloads: Variability Matters , 2016, ER.

[17]  Daniel J. Abadi,et al.  Sinew: a SQL system for multi-structured data , 2014, SIGMOD Conference.