Discovering Implicit Schemas in JSON Data

JSON has become a very popular lightweigth format for data exchange. JSON is human readable and easy for computers to parse and use. However, JSON is schemaless. Though this brings some benefits (e.g., flexibility in the representation of the data) it can become a problem when consuming and integrating data from different JSON services since developers need to be aware of the structure of the schemaless data. We believe that a mechanism to discover (and visualize) the implicit schema of the JSON data would largely facilitate the creation and usage of JSON services. For instance, this would help developers to understand the links between a set of services belonging to the same domain or API. In this sense, we propose a model-based approach to generate the underlying schema of a set of JSON documents.

[1]  Richard F. Paige,et al.  Different models for model matching: An analysis of approaches to support model differencing , 2009, 2009 ICSE Workshop on Comparison and Versioning of Software Models.

[2]  Khaled Shaalan,et al.  A Survey of Web Information Extraction Systems , 2006, IEEE Transactions on Knowledge and Data Engineering.

[3]  Petri Selonen,et al.  Metamodel-Based Inference of Inter-Model Correspondence , 2007, 11th European Conference on Software Maintenance and Reengineering (CSMR'07).

[4]  Jian Pei,et al.  MAPO: mining API usages from open source repositories , 2006, MSR '06.

[5]  Erhard Rahm,et al.  Similarity flooding: a versatile graph matching algorithm and its application to schema matching , 2002, Proceedings 18th International Conference on Data Engineering.

[6]  Mira Mezini,et al.  Learning from examples to improve code completion systems , 2009, ESEC/SIGSOFT FSE.

[7]  Hector Garcia-Molina,et al.  Extracting structured data from Web pages , 2003, SIGMOD '03.

[8]  Clemente Izurieta,et al.  Comparison of JSON and XML Data Interchange Formats: A Case Study , 2009, CAINE.

[9]  Hector Garcia-Molina,et al.  Joint Entity Resolution , 2012, 2012 IEEE 28th International Conference on Data Engineering.

[10]  Udo Kelter,et al.  Differences between versions of UML diagrams , 2003, ESEC/FSE-11.

[11]  Christoph Treude,et al.  Difference computation of large models , 2007, ESEC-FSE '07.

[12]  James Miller,et al.  Refactoring legacy AJAX applications to improve the efficiency of the data exchange component , 2013, J. Syst. Softw..

[13]  Jeffrey G. Gray,et al.  DSMDiff: a differentiation tool for domain-specific models , 2007 .

[14]  Ivan Porres,et al.  Difference and Union of Models , 2003, UML.

[15]  Mira Mezini,et al.  Ieee Transactions on Software Engineering 1 Automated Api Property Inference Techniques , 2022 .

[16]  Valter Crescenzi,et al.  Automatic information extraction from large websites , 2004, JACM.

[17]  Rafael Corchuelo,et al.  Towards Discovering Conceptual Models behind Web Sites , 2012, ER.

[18]  Serge Abiteboul,et al.  Inferring structure in semistructured data , 1997, SGMD.