Implementation of the Concept of a Repository for Automated Processing of Semi-Structural Data

Semi-structural data tend to be problematic due to the sparsity of their attributes and due to the fact that, regardless of their type, they are immensely diverse. This means that data storage is a challenge, especially when the data contained within a relational database – often a strict requirement defined in advance. In this paper, we present a thoroughly described concept of a repository that is capable of storing and processing semi-structural data. Based on this concept, we establish a database model comprising the architecture and the tools needed to search the data and build relevant processors. The processor described may assign roles and dispatch tasks between the users. We demonstrate how the capacities of this repository are capable of overcoming current limitations by creating a system for facilitated digitization of scientific resources. In addition, we show that the repository in question is suitable for general use, and, as such, may be adapted to any domains in which semi-structural data are processed, without any additional work required. Keywords—document management system, ECM, JSON, workflow.

[1]  Marek Kisiel-Dorohinicki,et al.  Multi-Domain Data Integration for Criminal Intelligence , 2013, ICMMI.

[2]  Vivek Tiwari,et al.  An Extended Views Based Big Data Model Toward Facilitating Electronic Health Record Analytics , 2019 .

[3]  Amedeo Napoli,et al.  BioRegistry: A Structured Metadata Repository for Bioinformatic Databases , 2005, CompLife.

[4]  Marc Gertz,et al.  Victimization, Fear of Crime, and Trust in Criminal Justice Institutions: A Cross-National Analysis , 2018, Crime & Delinquency.

[5]  Ying Liu,et al.  Closing the functional and Performance Gap between SQL and NoSQL , 2016, SIGMOD Conference.

[6]  Frank Shou-Cheng Tseng,et al.  An automatic load/extract scheme for XML documents through object-relational repositories , 2002, J. Syst. Softw..

[7]  Laurian M. Chirica,et al.  The entity-relationship model: toward a unified view of data , 1975, SIGF.

[8]  Guanhua Wang Improving Data Transmission in Web Applications via the Translation between XML and JSON , 2011, 2011 Third International Conference on Communications and Mobile Computing.

[9]  황규영,et al.  Inverted index storage structure using subindexes and large objects for tight coupling of information retrieval with database management systems , 2002 .

[10]  Perry L. Miller,et al.  Application of Information Technology: Organization of Heterogeneous Scientific Data Using the EAV/CR Representation , 1999, J. Am. Medical Informatics Assoc..

[11]  Daniela Florescu Managing Semi-Structured Data , 2005, ACM Queue.

[12]  Daniel J. Abadi,et al.  Sinew: a SQL system for multi-structured data , 2014, SIGMOD Conference.

[13]  Perry L. Miller,et al.  Research Paper: Exploring Performance Issues for a Clinical Database Organized Using an Entity-Attribute-Value Representation , 2000, J. Am. Medical Informatics Assoc..

[14]  Calton Pu,et al.  Building an Extensible Wrapper Repository System: a Metadata Approach , 1999 .

[15]  Tharam S. Dillon,et al.  A Layered View Model for XML Repositories and XML Data Warehouses , 2005, The Fifth International Conference on Computer and Information Technology (CIT'05).

[16]  Chun-Che Huang,et al.  The transformation and search of semi-structured knowledge in organizations , 2003, J. Knowl. Manag..

[17]  Marek Kisiel-Dorohinicki,et al.  Model for Dynamic and Hierarchical Data Repository in Relational Database , 2018, Comput. Sci..

[18]  David J. DeWitt,et al.  Relational Databases for Querying XML Documents: Limitations and Opportunities , 1999, VLDB.

[19]  Guan Le,et al.  Survey on NoSQL database , 2011, 2011 6th International Conference on Pervasive Computing and Applications.

[20]  Martin Doerr,et al.  A Semantic Network Approach to Semi-Structured Documents Repositories , 1997, ECDL.

[21]  M. Lee,et al.  ORA-SS: An Object-Relationship-Attribute Model for Semi-structured Data , 2000 .

[22]  Brian Litt,et al.  Enabling an Open Data Ecosystem for the Neurosciences , 2016, Neuron.

[23]  Robert Marcjan,et al.  Processing XML documents on the basis of quasi-relational model and SQLxD language , 2011 .

[24]  Matthew B. Jones,et al.  Challenges and Opportunities of Open Data in Ecology , 2011, Science.

[25]  Dimitrios Skoutas,et al.  Ontology-Based Conceptual Design of ETL Processes for Both Structured and Semi-Structured Data , 2007, Int. J. Semantic Web Inf. Syst..

[26]  Roy Goldman,et al.  From Semistructured Data to XML: Migrating the Lore Data Model and Query Language , 1999, Markup Lang..

[27]  Roy Goldman,et al.  Lore: a database management system for semistructured data , 1997, SGMD.

[28]  Clemente Izurieta,et al.  Comparison of JSON and XML Data Interchange Formats: A Case Study , 2009, CAINE.

[29]  Robert Marcjan,et al.  A new approach to storing dynamic data in relational databases using JSON , 2018, Comput. Sci..

[30]  Michael Rys XML and relational database management systems: inside Microsoft® SQL Server™ 2005 , 2005, SIGMOD '05.