Integrated access to big data polystores through a knowledge-driven framework

The recent successes of commercial cognitive and AI applications have cast a spotlight on knowledge graphs and the benefits of consuming structured semantic data. Today, knowledge graphs are ubiquitous to the extent that organizations often view them as a “single source of truth” for all of their data and other digital artifacts. In most organizations, however, Big Data comes in many different forms including time series, images, and unstructured text, which often are not suitable for efficient storage within a knowledge graph. This paper presents the Semantics Toolkit (SemTK), a framework that enables access to polyglot-persistent Big Data stores while giving the appearance that all data is fully captured within a knowledge graph. SemTK allows data to be stored across multiple storage platforms (e.g., Big Data stores such as Hadoop, graph databases, and semantic triple stores) — with the best-suited platform adopted for each data type — while maintaining a single logical interface and point of access, thereby giving users a knowledge-driven veneer across their data. We describe the ease of use and benefits of constructing and querying polystore knowledge graphs with SemTK via four industrial use cases at GE.

[1]  Diego Calvanese,et al.  The MASTRO system for ontology-based data access , 2011, Semantic Web.

[2]  Michael Stonebraker,et al.  The BigDAWG Polystore System , 2015, SGMD.

[3]  Riccardo Torlone,et al.  QUEPA: QUerying and Exploring a Polystore by Augmentation , 2016, SIGMOD Conference.

[4]  Michael Hausenblas,et al.  Apache Drill: Interactive Ad-Hoc Analysis at Scale , 2013, Big Data.

[5]  Yannis Papakonstantinou,et al.  The SQL++ Semi-structured Data Model and Query Language: A Capabilities Survey of SQL-on-Hadoop, NoSQL and NewSQL Databases , 2014, ArXiv.

[6]  Mohamed A. Soliman,et al.  Datometry Hyper-Q: Bridging the Gap Between Real-Time and Historical Analytics , 2016, SIGMOD Conference.

[7]  Anastasia Ailamaki,et al.  Fast Queries Over Heterogeneous Data Through Engine Customization , 2016, Proc. VLDB Endow..

[8]  Ioana Manolescu,et al.  Invisible Glue: Scalable Self-Tunning Multi-Stores , 2015, CIDR.

[9]  Yannis Kotidis,et al.  Digree: A middleware for a graph databases polystore , 2016, 2016 IEEE International Conference on Big Data (Big Data).

[10]  Daniel P. Miranker,et al.  Ultrawrap: SPARQL execution on relational data , 2013, J. Web Semant..

[11]  B Praveen Kumar,et al.  Mariposa a Wide-Area Distributed Database System , 2010, ICCA 2010.

[12]  Camelia Elena Ciolac MELOGRAPH: Multi-Engine WorkfLOw Graph Processing , 2016, EDBT/ICDT Workshops.

[13]  Carlos Buil-Aranda,et al.  Federated Query Processing for the Semantic Web , 2014 .

[14]  Jens Lehmann,et al.  DBpedia - A large-scale, multilingual knowledge base extracted from Wikipedia , 2015, Semantic Web.

[15]  Ian Horrocks,et al.  A semantic approach to polystores , 2016, 2016 IEEE International Conference on Big Data (Big Data).

[16]  Fabian M. Suchanek,et al.  YAGO3: A Knowledge Base from Multilingual Wikipedias , 2015, CIDR.

[17]  Philip A. Bernstein,et al.  Schema merging and mapping creation for relational sources , 2008, EDBT '08.

[18]  Michael N. Gubanov PolyFuse: A Large-Scale Hybrid Data Fusion System , 2017, 2017 IEEE 33rd International Conference on Data Engineering (ICDE).

[19]  Ian Horrocks,et al.  Experiencing OptiqueVQS: a multi-paradigm and ontology-based visual query system for end users , 2015, Universal Access in the Information Society.

[20]  Günter Ladwig,et al.  FedBench: A Benchmark Suite for Federated Semantic Data Query Processing , 2011, SEMWEB.

[21]  Wolfram Wöß,et al.  Towards a Definition of Knowledge Graphs , 2016, SEMANTiCS.

[22]  Hakan Hacigümüs,et al.  MISO: souping up big data query processing with a multistore system , 2014, SIGMOD Conference.

[23]  Abraham Silberschatz,et al.  HadoopDB: An Architectural Hybrid of MapReduce and DBMS Technologies for Analytical Workloads , 2009, Proc. VLDB Endow..

[24]  Varish Mulwad,et al.  SemTK: An Ontology-first, Open Source Semantic Toolkit for Managing and Querying Knowledge Graphs , 2017, ArXiv.

[25]  Markus Krötzsch,et al.  Wikidata , 2014, Commun. ACM.

[26]  Steven M. Gustafson,et al.  Semantics for Big Data access & integration: Improving industrial equipment design through increased data usability , 2015, 2015 IEEE International Conference on Big Data (Big Data).

[27]  Roi Blanco,et al.  Entity Recommendations in Web Search , 2013, SEMWEB.

[28]  Patrick Valduriez,et al.  CloudMdsQL: querying heterogeneous cloud data stores with a common language , 2016, Distributed and Parallel Databases.

[29]  Joseph K. Bradley,et al.  Spark SQL: Relational Data Processing in Spark , 2015, SIGMOD Conference.

[30]  Lakshmish Ramaswamy,et al.  Knowledge Graph-Based Query Rewriting in a Relational Data Harmonization Framework , 2016, 2016 IEEE 2nd International Conference on Collaboration and Internet Computing (CIC).