A uniform representation of multi-variant data in intensive-query databases

In this paper a new approach for the representation of multi-variant data is introduced. Current approaches consist on either hard-core coding techniques or conceptual / logical models to integrate structured and semi-structured data in customized, application-specific ways. The representation introduced here relies instead on unfolding technique to represent multi-variant data uniformly. This leads to a framework with core functionalities for organizing structured and semi-structured data. The paper presents also an efficient methodology towards retrieval of data from the proposed storage along with comparative performance analysis against existing practices. Accuracy, precision, and recall of the proposed technique are quantitatively evaluated and carefully reported.

[1]  Jens Dittrich,et al.  iDM: a unified and versatile data model for personal dataspace management , 2006, VLDB.

[2]  Tok Wang Ling,et al.  Labeling Dynamic XML Documents: An Order-Centric Approach , 2012, IEEE Transactions on Knowledge and Data Engineering.

[3]  Roy Goldman,et al.  Lore: a database management system for semistructured data , 1997, SGMD.

[4]  A. V. Chernov,et al.  Multilevel data storage model of fuzzy semi-structured data , 2015, 2015 XVIII International Conference on Soft Computing and Measurements (SCM).

[5]  Sebastian Rudolph,et al.  Managing Structured and Semistructured RDF Data Using Structure Indexes , 2013, IEEE Transactions on Knowledge and Data Engineering.

[6]  Tok Wang Ling,et al.  Designing semistructured databases using ORA-SS model , 2001, Proceedings of the Second International Conference on Web Information Systems Engineering.

[7]  Scott Boag,et al.  XQuery 1.0 : An XML Query Language , 2007 .

[8]  Jennifer Widom,et al.  The TSIMMIS Approach to Mediation: Data Models and Languages , 1997, Journal of Intelligent Information Systems.

[9]  Nabendu Chaki,et al.  DFRS: a domain-based framework for representing semi-structured data , 2012, CUBE.

[10]  Truls Amundsen Bjørklund,et al.  Towards unifying advances in twig join algorithms , 2010, ADC.

[11]  Chee Yong Chan,et al.  Minimization of tree pattern queries with constraints , 2008, SIGMOD Conference.

[12]  Tomas Novotny,et al.  A Content-Oriented Data Model for Semistructured Data , 2007, DATESO.

[13]  Agostino Cortesi,et al.  Hyper-lattice Algebraic Model for Data Warehousing , 2016 .

[14]  Mike Hobbs,et al.  Data integration approach for semi-structured and structured data (Linked Data) , 2015, 2015 IEEE 13th International Conference on Industrial Informatics (INDIN).

[15]  Jérôme Darmont,et al.  A Survey of XML Tree Patterns , 2017, IEEE Transactions on Knowledge and Data Engineering.

[16]  Matteo Magnani,et al.  Dimensions of ignorance in a semi-structured data model , 2004 .

[17]  Peter McBrien,et al.  Integrating Unnormalised Semi-structured Data Sources , 2005, CAiSE.

[18]  Mitsuru Ishizuka,et al.  SentiFul: A Lexicon for Sentiment Analysis , 2011, IEEE Transactions on Affective Computing.

[19]  Maurizio Lenzerini,et al.  A Uniform Framework for Concept Definitions in Description Logics , 1997, J. Artif. Intell. Res..

[20]  Su-Cheng Haw,et al.  TwigX-Guide: An Efficient Twig Pattern Matching System Extending DataGuide Indexing and Region Encoding Labeling , 2009, J. Inf. Sci. Eng..

[21]  Meifeng Xu,et al.  A novel approach of computing XML similarity based on weighted XML data model , 2010, IEEE ICCA 2010.

[22]  Li Jianwei,et al.  Performance Analysis of Data Organization of the Real-Time Memory Database Based on Red-Black Tree , 2010, 2010 International Conference on Computing, Control and Industrial Engineering.

[23]  Prakash V. Ramanan,et al.  Efficient algorithms for minimizing tree pattern queries , 2002, SIGMOD '02.

[24]  Chen Wang,et al.  Extended XML Tree Pattern Matching: Theories and Algorithms , 2011, IEEE Transactions on Knowledge and Data Engineering.

[25]  Gillian J. Greene A Generic Framework for Concept-Based Exploration of Semi-Structured Software Engineering Data , 2015, 2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[26]  Xiaoying Wu,et al.  Processing and Evaluating Partial Tree Pattern Queries on XML Data , 2012, IEEE Transactions on Knowledge and Data Engineering.

[27]  Wim Martens,et al.  Efficient algorithms for descendant-only tree pattern queries , 2009, Inf. Syst..

[28]  D. Devakumari,et al.  Ginix Generalized Inverted Index for Keyword Search , 2016 .

[29]  Jennifer Widom,et al.  Ozone: Integrating Structured and Semistructured Data , 1999, DBPL.

[30]  B. M. Monjurul Alom,et al.  Querying Semistructured Data with Compression in Distributed Environments , 2009, 2009 Sixth International Conference on Information Technology: New Generations.

[31]  James Miller,et al.  Extended Subtree: A New Similarity Function for Tree Structured Data , 2014, IEEE Transactions on Knowledge and Data Engineering.

[32]  Jong P. Yoon Presto authorization: a bitmap indexing scheme for high-speed access control to XML documents , 2006, IEEE Transactions on Knowledge and Data Engineering.

[33]  Torben Bach Pedersen,et al.  Converting XML DTDs to UML diagrams for conceptual data integration , 2001, Data Knowl. Eng..

[34]  Nabendu Chaki,et al.  A Survey on the Semi-Structured Data Models , 2011, CISIM.