A general framework for big data knowledge discovery and integration

Data structure description, conceptual modeling, and logic reasoning for knowledge discovery are three critical factors for the integration of information with heterogeneity. In particular, technologies of NoSQL databases and Internet of Things raise an urgent requirement for a uniform expression of heterogeneous data, and little attention has been paid to researches on the integration of NoSQL databases with traditional data models, as well as the semantic description of big data. To tackle these problems, in this paper, a concept‐and‐relation‐oriented grid data model called GODM model is first proposed based on the definitions of Monad, Compounder, Relation, etc. Then, the GODM model is utilized to uniformly describe traditional data models and NoSQL data models, which eliminates structure differences of heterogeneous data. Next, based on the GODM relation mechanism, an extendable semantic system is built up by choosing SHOIQ(D) description logic as the example to establish the correspondence with GODM grammar subset, providing a fundamental support for semantic integration and knowledge discovery of heterogeneous data. After that, comprehensive comparisons with GODM and other models are made, especially the distinctions between GODM and OWL on the aspects of relation mechanism, hybrid schema, description logic, grammatical constructors, etc. Besides, experimental evaluations and analyses on time and space efficiencies of some primary common data models are conducted after the proposal of a general evaluation model, with the results showing that the GODM model has great advantage on properties of expressiveness, flexibility, etc, particularly time and space efficiency. In summary, the GODM model describes heterogeneous data from both aspects of data structure and semantic relationship and realizes a hybrid schema reconciling the schemaful and schemaless data models, making it especially suitable for dynamic data integration and knowledge discovery from big data models.

[1]  Ramon Lawrence Integration and Virtualization of Relational SQL and NoSQL Systems Including MySQL and MongoDB , 2014, 2014 International Conference on Computational Science and Computational Intelligence.

[2]  Michael Boyd,et al.  Comparing and Transforming Between Data Models Via an Intermediate Hypergraph Data Model , 2005, J. Data Semant..

[3]  Ian Horrocks,et al.  The Even More Irresistible SROIQ , 2006, KR.

[4]  Jennifer Widom,et al.  The TSIMMIS Approach to Mediation: Data Models and Languages , 1997, Journal of Intelligent Information Systems.

[5]  Zongmin Ma,et al.  A methodology for measuring structure similarity of fuzzy XML documents , 2017, Computing.

[6]  Domenico Beneventano,et al.  Data lineage in the MOMIS data fusion system , 2011, 2011 IEEE 27th International Conference on Data Engineering Workshops.

[7]  Christoph Bussler,et al.  Mapping between heterogeneous XML and OWL transaction representations in B2B integration , 2011, Data Knowl. Eng..

[8]  Pablo R. Fillottrani,et al.  Toward an Ontology-Driven Unifying Metamodel for UML Class Diagrams, EER, and ORM2 , 2013, ER.

[9]  Domenico Beneventano,et al.  The MOMIS methodology for integrating heterogeneous data sources , 2004, IFIP Congress Topical Sessions.

[10]  Zohra Bellahsene,et al.  PORSCHE: Performance ORiented SCHEma mediation , 2008, Inf. Syst..

[11]  Ying Chen,et al.  Versatile: a scalable CORBA-based system for integrating distributed data , 1997, 1997 IEEE International Conference on Intelligent Processing Systems (Cat. No.97TH8335).

[12]  Sourav S. Bhowmick,et al.  An XML Schema integration and query mechanism system , 2008, Data Knowl. Eng..

[13]  John Mylopoulos,et al.  Constructing Complex Semantic Mappings Between XML Data and Ontologies , 2005, SEMWEB.

[14]  Daniel J. Abadi,et al.  Automatic Generation of Normalized Relational Schemas from Nested Key-Value Data , 2016, SIGMOD Conference.

[15]  Peter F. Patel-Schneider,et al.  OWL 2 Web Ontology Language New Features and Rationale , 2009 .

[16]  Peter F. Patel-Schneider,et al.  Reducing OWL entailment to description logic satisfiability , 2004, Journal of Web Semantics.

[17]  Sabine Loudcher,et al.  X-WACoDa: An XML-based approach for Warehousing and Analyzing Complex Data , 2017, ArXiv.

[18]  Weiwei Lin,et al.  A Flexible Data Structure for Heterogeneous Information Integration , 2017 .

[19]  Ian Horrocks,et al.  A Tableaux Decision Procedure for SHOIQ , 2005, IJCAI.

[20]  Jennifer Widom,et al.  Object exchange across heterogeneous information sources , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[21]  Seth Pettie,et al.  Linear-Time Approximation for Maximum Weight Matching , 2014, JACM.

[22]  Adila Krisnadhi,et al.  Ontology Pattern-Based Data Integration , 2015 .

[23]  Tang Yong,et al.  Semantic Web Oriented Description Logic , 2007 .

[24]  A. Belghiat,et al.  Transformation of UML models towards OWL ontologies , 2012, 2012 6th International Conference on Sciences of Electronics, Technologies of Information and Telecommunications (SETIT).

[25]  Gang Zhou,et al.  A framework for supporting data integration using the materialized and virtual approaches , 1996, SIGMOD '96.

[26]  Zhong-Zhi Shi,et al.  Reasoning About Semantic Web Services with an Approach Based on Dynamic Description Logics: Reasoning About Semantic Web Services with an Approach Based on Dynamic Description Logics , 2009 .

[27]  Olivier Curé,et al.  Ontology Based Data Integration Over Document and Column Family Oriented NOSQL , 2013, ArXiv.

[28]  Pablo R. Fillottrani,et al.  An ontology-driven unifying metamodel of UML Class Diagrams, EER, and ORM2 , 2015, Data Knowl. Eng..

[29]  Zongmin Ma,et al.  Formal semantics-preserving translation from fuzzy ER model to fuzzy OWL DL ontology , 2010, Web Intell. Agent Syst..

[30]  Alon Y. Halevy,et al.  Data Integration: After the Teenage Years , 2017, PODS.

[31]  Ian Horrocks,et al.  From SHIQ and RDF to OWL: the making of a Web Ontology Language , 2003, J. Web Semant..

[32]  Boris Motik,et al.  OWL 2 Web Ontology Language: structural specification and functional-style syntax , 2008 .

[33]  R. G. G. Cattell,et al.  Recent books , 2000, IEEE Spectrum.

[34]  Daniel J. Abadi,et al.  Sinew: a SQL system for multi-structured data , 2014, SIGMOD Conference.

[35]  Diego Calvanese,et al.  The Description Logic Handbook , 2007 .

[36]  Claire David,et al.  XML Schema Mappings , 2014, J. ACM.

[37]  Diego Calvanese,et al.  The Description Logic Handbook: Theory, Implementation, and Applications , 2003, Description Logic Handbook.

[38]  Leslie F. Sikos Description Logics: Formal Foundation for Web Ontology Engineering , 2017 .

[39]  Maurizio Vincini,et al.  Semantic Integration of Heterogeneous Data Sources in the MOMIS Data Transformation System , 2013, J. Univers. Comput. Sci..

[40]  Danilo Ardagna,et al.  Supporting the Development and Operation of Multi-cloud Applications: The MODAClouds Approach , 2013, 2013 15th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing.

[41]  Wilson C. Hsieh,et al.  Bigtable: A Distributed Storage System for Structured Data , 2006, TOCS.

[42]  Deborah L. McGuinness,et al.  OWL Web ontology language overview , 2004 .

[43]  Ruay-Shiung Chang,et al.  Simplifying MapReduce data processing , 2013, Int. J. Comput. Sci. Eng..

[45]  John Mylopoulos,et al.  Translating XML Web Data into Ontologies , 2005, OTM Workshops.

[46]  J. Wenny Rahayu,et al.  Double-layered schema integration of heterogeneous XML sources , 2011, J. Syst. Softw..

[47]  A Clara Kanmani,et al.  An Exploratory Study of RDF: A Data Model for Cloud Computing , 2016, FICTA.

[48]  Shi Zhong Reasoning About Semantic Web Services with an Approach Based on Dynamic Description Logics , 2008 .

[49]  R. Vijayakumar,et al.  Ontology based data integration of NoSQL datastores , 2014, 2014 9th International Conference on Industrial and Information Systems (ICIIS).

[50]  Ruay-Shiung Chang,et al.  Simplifying MapReduce Data Processing , 2011, 2011 Fourth IEEE International Conference on Utility and Cloud Computing.

[51]  Zahir Tari,et al.  Controlling Aggregation in Distributed Object Systems: A Graph-Based Approach , 2001, IEEE Trans. Parallel Distributed Syst..

[52]  Zongmin Ma,et al.  Formal Semantics-Preserving Translation from Fuzzy ER Model to Fuzzy OWL DL Ontology , 2008, 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology.

[53]  Guandong Xu,et al.  Integration mapping rules: Transforming relational database to semantic web ontology , 2016 .