Automatic Extraction of a Document-oriented NoSQL Schema

The NoSQL systems make it possible to manage Databases (DB) verifying the 3Vs: Volume, Variety and Velocity. Most of these systems are characterized by the property schemaless which means absence of the data schema when creating a DB. This property provides undeniable flexibility by allowing the schema to evolve while the DB is in use; however, it is a major obstacle for developers and decision makers. Indeed, the expression of queries (SQL type) requires precise knowledge of this schema. In this article, we provide a process for automatically extracting the schema from a NoSQL document-oriented DB. To do this, we use the MDA (Model Driven Architecture). From a NoSQL DB, we propose transformation rules to generate the schema. An experiment of the extraction process was carried out on a medical application.

[1]  Dario Colazzo,et al.  Schema Inference for Massive JSON Datasets , 2017, EDBT.

[2]  Guan Le,et al.  Survey on NoSQL database , 2011, 2011 6th International Conference on Pervasive Computing and Applications.

[3]  Mark Rouncefield,et al.  Model-driven engineering practices in industry , 2011, 2011 33rd International Conference on Software Engineering (ICSE).

[4]  Frank Budinsky,et al.  Eclipse modeling framework : a developer's guide , 2004 .

[5]  Jacky Akoka,et al.  Model driven reverse engineering of NoSQL property graph databases: The case of Neo4j , 2017, 2017 IEEE International Conference on Big Data (Big Data).

[6]  Jesús García Molina,et al.  Inferring Versioned Schemas from NoSQL Databases and Its Applications , 2015, ER.

[7]  Takaaki Goto,et al.  A Framework to Convert NoSQL to Relational Model , 2018, ACIT 2018.

[8]  Jordi Cabot,et al.  JSONDiscoverer: Visualizing the schema lurking behind JSON documents , 2016, Knowl. Based Syst..

[9]  Matteo Golfarelli,et al.  Schema profiling of document-oriented databases , 2018, Inf. Syst..

[10]  C. L. Philip Chen,et al.  Data-intensive applications, challenges, techniques and technologies: A survey on Big Data , 2014, Inf. Sci..

[11]  Dario Colazzo,et al.  Parametric schema inference for massive JSON datasets , 2019, The VLDB Journal.

[12]  Meike Klettke,et al.  Schema Extraction and Structural Outlier Detection for JSON-based NoSQL Data Stores , 2015, BTW.

[13]  Jean Bézivin,et al.  Towards a precise definition of the OMG/MDA framework , 2001, Proceedings 16th Annual International Conference on Automated Software Engineering (ASE 2001).