The Solid architecture for real-time management of big semantic data

Big Data?management has become a critical task in many application systems, which usually rely on heavyweight batch processes to manage such large amounts of data. However, batch architectures are not an adequate choice for designing real-time systems in which data updates and reads must be satisfied with very low latency. Thus, gathering and consuming high volumes of data at high velocities is an emerging challenge which we specifically address in the scope of innovative scenarios based on semantic data (RDF) management. The Linked Open Data initiative or emergent projects in the Internet of Things are examples of such scenarios. This paper describes a new architecture (referred to as Solid) which separates the complexities of Big Semantic Data?storage and indexing from real-time data acquisition and consumption. This decision relies on the use of two optimized datastores which respectively store historical (big) data and run-time data. It ensures efficient volume management and high processing velocity, but adds the need of coordinating both datastores. Solid ?proposes a 3-tiered architecture in which each responsibility is specifically addressed. Besides its theoretical description, we also propose and evaluate a Solid ?prototype built on top of binary RDF and state-of-the-art triplestores. Our experimental numbers report that Solid ?achieves large savings in data storage (it uses up to 5 times less space than the compared triplestores), while provides efficient SPARQL resolution over the Big Semantic Data?(in the order of 10-20?ms for the studied queries). These experiments also show that Solid ?ensures low-latency operations because data effectively managed in real-time remain small, so do not suffer Big Data?issues. We propose an architecture (Solid) for managing big semantic data in real-time.Specific big data and real-time responsibilities are isolated in dedicated layers.A dynamic pipe-filter solution is introduced for addressing query responsibilities.Solid ?leverages Rdf/Hdt ?features to obtain the most compressed representations.The Solid ?prototype performs competitive respect to the most prominent triplestores.

[1]  Jürgen Umbrich,et al.  YARS2: A Federated Repository for Querying Graph Structured Data from the Web , 2007, ISWC/ASWC.

[2]  Tom Heath,et al.  Linked Data: Evolving the Web into a Global Data Space , 2011, Linked Data.

[3]  Edmon Begoli,et al.  Design Principles for Effective Knowledge Discovery from Big Data , 2012, 2012 Joint Working IEEE/IFIP Conference on Software Architecture and European Conference on Software Architecture.

[4]  Carlos E. Cuesta,et al.  Towards an Architecture for Managing Big Semantic Data in Real-Time , 2013, ECSA.

[5]  Gerhard Weikum,et al.  The RDF-3X engine for scalable management of RDF data , 2010, The VLDB Journal.

[6]  Orri Erling,et al.  RDF Support in the Virtuoso DBMS , 2007, CSSW.

[7]  Axel Polleres,et al.  Binary RDF representation for publication and exchange (HDT) , 2013, J. Web Semant..

[8]  Amit P. Sheth,et al.  Linked sensor data , 2010, 2010 International Symposium on Collaborative Technologies and Systems.

[9]  Orri Erling,et al.  Virtuoso, a Hybrid RDBMS/Graph Column Store , 2012, IEEE Data Eng. Bull..

[10]  Marcin Zukowski,et al.  Vectorization vs. compilation in query execution , 2011, DaMoN '11.

[11]  Daniel J. Abadi,et al.  Scalable SPARQL querying of large RDF graphs , 2011, Proc. VLDB Endow..

[12]  Miguel A. Martínez-Prieto,et al.  MapReduce-based Solutions for Scalable SPARQL Querying , 2014, Open J. Semantic Web.

[13]  Ying Zhang,et al.  SRBench: A Streaming RDF/SPARQL Benchmark , 2012, SEMWEB.

[14]  Barry Bishop,et al.  OWLIM: A family of scalable semantic repositories , 2011, Semantic Web.

[15]  Jens Lehmann,et al.  DBpedia: A Nucleus for a Web of Open Data , 2007, ISWC/ASWC.

[16]  Miguel A. Martínez-Prieto,et al.  Exchange and Consumption of Huge RDF Data , 2012, ESWC.

[17]  Tim Berners-Lee,et al.  Linked data , 2020, Semantic Web for the Working Ontologist.

[18]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[19]  E. Prud hommeaux,et al.  SPARQL query language for RDF , 2011 .

[20]  Jules J. Berman,et al.  Principles of Big Data: Preparing, Sharing, and Analyzing Complex Information , 2013 .

[21]  Gonzalo Navarro,et al.  Compressed full-text indexes , 2007, CSUR.

[22]  Gio Wiederhold,et al.  Mediators in the architecture of future information systems , 1992, Computer.

[23]  Frank Dabek,et al.  Large-scale Incremental Processing Using Distributed Transactions and Notifications , 2010, OSDI.

[24]  James A. Hendler,et al.  The Semantic Web" in Scientific American , 2001 .

[25]  Nathan Marz,et al.  Big Data: Principles and best practices of scalable realtime data systems , 2015 .

[26]  Paul Zikopoulos,et al.  Understanding Big Data: Analytics for Enterprise Class Hadoop and Streaming Data , 2011 .

[27]  Wilson C. Hsieh,et al.  Bigtable: A Distributed Storage System for Structured Data , 2006, TOCS.

[28]  Richard N. Taylor,et al.  Software architecture: foundations, theory, and practice , 2009, 2010 ACM/IEEE 32nd International Conference on Software Engineering.

[29]  Ramez Elmasri,et al.  Fundamentals of Database Systems , 1989 .