A software reference architecture for semantic-aware Big Data systems

Abstract Context: Big Data systems are a class of software systems that ingest, store, process and serve massive amounts of heterogeneous data, from multiple sources. Despite their undisputed impact in current society, their engineering is still in its infancy and companies find it difficult to adopt them due to their inherent complexity. Existing attempts to provide architectural guidelines for their engineering fail to take into account important Big Data characteristics, such as the management, evolution and quality of the data. Objective: In this paper, we follow software engineering principles to refine the λ -architecture, a reference model for Big Data systems, and use it as seed to create Bolster , a software reference architecture (SRA) for semantic-aware Big Data systems. Method: By including a new layer into the λ -architecture, the Semantic Layer, Bolster  is capable of handling the most representative Big Data characteristics (i.e., Volume, Velocity, Variety, Variability and Veracity). Results: We present the successful implementation of Bolster  in three industrial projects, involving five organizations. The validation results show high level of agreement among practitioners from all organizations with respect to standard quality factors. Conclusion: As an SRA, Bolster  allows organizations to design concrete architectures tailored to their specific needs. A distinguishing feature is that it provides semantic-awareness in Big Data Systems. These are Big Data system implementations that have components to simplify data definition and exploitation. In particular, they leverage metadata (i.e., data describing data) to enable (partial) automation of data exploitation and to aid the user in their decision making processes. This simplification supports the differentiation of responsibilities into cohesive roles enhancing data governance.

[1]  Abdullah Gani,et al.  A survey on indexing techniques for big data: taxonomy and performance evaluation , 2016, Knowledge and Information Systems.

[2]  David J. DeWitt,et al.  Scientific data management in the coming decade , 2005, SGMD.

[3]  Paris Avgeriou,et al.  Empirically-grounded reference architectures: a proposal , 2011, QoSA-ISARCS '11.

[4]  Carol V. Brown,et al.  Designing data governance , 2010, CACM.

[5]  Carlos E. Cuesta,et al.  The Solid architecture for real-time management of big semantic data , 2015, Future Gener. Comput. Syst..

[6]  Geoffrey C. Fox,et al.  NIST Big Data Interoperability Framework: volume 3, use cases and general requirements , 2018 .

[7]  Alberto Abelló,et al.  Towards Intelligent Data Analysis: The Metadata Challenge , 2016, IoTBD.

[8]  Yueting Zhuang,et al.  D-Ocean: an unstructured data management system for data ocean environment , 2015, Frontiers of Computer Science.

[9]  Zhiwu Xie,et al.  Towards Use And Reuse Driven Big Data Management , 2015, JCDL.

[10]  Athanasios V. Vasilakos,et al.  Big data analytics: a survey , 2015, Journal of Big Data.

[11]  Torben Bach Pedersen,et al.  Towards Next Generation BI Systems: The Analytical Metadata Challenge , 2014, DaWaK.

[12]  Patrick Valduriez,et al.  Principles of Distributed Database Systems, Third Edition , 2011 .

[13]  Rui Zhang,et al.  Towards a Big Data Benchmarking and Demonstration Suite for the Online Social Network Era with Realistic Workloads and Live Data , 2015, BPOE.

[14]  Michael Weyrich,et al.  Reference Architectures for the Internet of Things , 2016, IEEE Software.

[15]  John Klein,et al.  Distribution, Data, Deployment: Software Architecture Convergence in Big Data Systems , 2015, IEEE Software.

[16]  Alberto Abelló,et al.  NOSQL Design for Analytical Workloads: Variability Matters , 2016, ER.

[17]  Erik Meijer,et al.  A co-Relational Model of Data for Large Shared Data Banks , 2011, ECOOP.

[18]  Robert W. Fitzgerald,et al.  Framework for Analysis , 2005 .

[19]  Dominik Ryzko,et al.  Multi-agent Architecture for Real-Time Big Data Processing , 2014, 2014 IEEE/WIC/ACM International Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technologies (IAT).

[20]  Antonio Puliafito,et al.  AllJoyn Lambda: An architecture for the management of smart environments in IoT , 2014, 2014 International Conference on Smart Computing Workshops.

[21]  Erik Meijer A Co-relational Model of Data for Large Shared Data Banks , 2011, ECOOP.

[22]  Chen Li,et al.  AsterixDB: A Scalable, Open Source BDMS , 2014, Proc. VLDB Endow..

[23]  Mohammad Kazem Akbari,et al.  Customizing ISO 9126 quality model for evaluation of B2B applications , 2009, Inf. Softw. Technol..

[24]  Jignesh M. Patel,et al.  Big data and its technical challenges , 2014, CACM.

[25]  Wo L. Chang,et al.  Big Data: Challenges, practices and technologies: NIST Big Data Public Working Group workshop at IEEE Big Data 2014 , 2014, 2014 IEEE International Conference on Big Data (Big Data).

[26]  Thomas A. Runkler,et al.  Stream Processing on Demand for Lambda Architectures , 2015, EPEW.

[27]  Dan Ionescu,et al.  An Architecture and Methods for Big Data Analysis , 2014, SOFA.

[28]  Oscar Romero,et al.  DSS from an RE Perspective: A systematic mapping , 2016, J. Syst. Softw..

[29]  Marie-Aude Aufaure,et al.  What's Up in Business Intelligence? A Contextual and Knowledge-Based Perspective , 2013, ER.

[30]  Ramesh Sharda,et al.  Business analytics: Research and teaching perspectives , 2013, Proceedings of the ITI 2013 35th International Conference on Information Technology Interfaces.

[31]  Divyakant Agrawal,et al.  Big data and cloud computing: current state and future opportunities , 2011, EDBT/ICDT '11.

[32]  Craig Boutilier,et al.  Towards Cooperative Negotiation for Decentralized Resource Allocation in Autonomic Computing Systems , 2003, IJCAI.

[33]  Taxonomies Subgroup. NIST Big Data Interoperability Framework:: volume 3, use cases and general requirements version 3 , 2019 .

[34]  RomeroOscar,et al.  DSS from an RE Perspective , 2016 .

[35]  Schahram Dustdar,et al.  CloudMan: A platform for portable cloud manufacturing services , 2014, 2014 IEEE International Conference on Big Data (Big Data).

[36]  Xavier Franch,et al.  A survey on the benefits and drawbacks of AUTOSAR , 2015, 2015 First International Workshop on Automotive Software Architecture (WASA).

[37]  BatiniCarlo,et al.  From Data Quality to Big Data Quality , 2015 .

[38]  Yogesh L. Simmhan,et al.  The Open Provenance Model core specification (v1.1) , 2011, Future Gener. Comput. Syst..

[39]  Andriy V. Miranskyy,et al.  Big Picture of Big Data Software Engineering: With Example Research Challenges , 2015, 2015 IEEE/ACM 1st International Workshop on Big Data Software Engineering.

[40]  Ge Yu,et al.  HaoLap: A Hadoop based OLAP system for big data , 2015, J. Syst. Softw..

[41]  Nathan Marz,et al.  Big Data: Principles and best practices of scalable realtime data systems , 2015 .

[42]  Wolfgang Lehner,et al.  ResilientStore: A Heuristic-Based Data Format Selector for Intermediate Results , 2016, MEDI.

[43]  Miryung Kim,et al.  Titian: Data Provenance Support in Spark , 2015, Proc. VLDB Endow..

[44]  Haixun Wang,et al.  A Distributed Graph Engine for Web Scale RDF Data , 2013, Proc. VLDB Endow..

[45]  Jun Rao,et al.  Liquid: Unifying Nearline and Offline Big Data Integration , 2015, CIDR.

[46]  Jyrki Kontio,et al.  A case study in applying a systematic method for COTS selection , 1996, Proceedings of IEEE 18th International Conference on Software Engineering.

[47]  Arnaud Giacometti,et al.  A framework for recommending OLAP queries , 2008, DOLAP '08.

[48]  Mary Roth,et al.  Data Wrangling: The Challenging Yourney from the Wild to the Lake , 2015, CIDR.

[49]  Eric A. Brewer,et al.  Towards robust distributed systems (abstract) , 2000, PODC '00.

[50]  Michael W. Godfrey,et al.  A reference architecture for Web browsers , 2005, 21st IEEE International Conference on Software Maintenance (ICSM'05).

[51]  Bas Geerdink,et al.  A reference architecture for big data solutions introducing a model to perform predictive analytics using big data technology , 2013, 8th International Conference for Internet Technology and Secured Transactions (ICITST-2013).

[52]  Jennifer Widom,et al.  Models and issues in data stream systems , 2002, PODS.

[53]  Panos Vassiliadis,et al.  An Integration-Oriented Ontology to Govern Evolution in Big Data Ecosystems , 2017, EDBT/ICDT Workshops.

[54]  Richard Schroeder,et al.  Six Sigma: The Breakthrough Management Strategy Revolutionizing the World's Top Corporations , 1999 .

[55]  PääkkönenPekka,et al.  Reference Architecture and Classification of Technologies, Products and Services for Big Data Systems , 2015 .

[56]  Jianhua Ma,et al.  An effective and economical architecture for semantic-based heterogeneous multimedia big data retrieval , 2015, J. Syst. Softw..

[57]  Jin Tong,et al.  NIST Cloud Computing Reference Architecture: Recommendations of the National Institute of Standards and Technology (Special Publication 500-292) , 2012 .

[58]  WangZhongyuan,et al.  A distributed graph engine for web scale RDF data , 2013, VLDB 2013.

[59]  Melnned M. Kantardzic Big Data Analytics , 2013, Lecture Notes in Computer Science.

[60]  Carlos Ordonez,et al.  Statistical Model Computation with UDFs , 2010, IEEE Transactions on Knowledge and Data Engineering.

[61]  Carlo Batini,et al.  From Data Quality to Big Data Quality , 2015, J. Database Manag..

[62]  Hoan Quoc Nguyen-Mau,et al.  A middleware framework for scalable management of linked streams , 2012, J. Web Semant..

[63]  Filip De Turck,et al.  Tengu: An Experimentation Platform for Big Data Applications , 2015, 2015 IEEE 35th International Conference on Distributed Computing Systems Workshops.

[64]  Saso Dzeroski,et al.  OntoDM: An Ontology of Data Mining , 2008, 2008 IEEE International Conference on Data Mining Workshops.

[65]  Himanshu Gupta,et al.  The RADStack: Open Source Lambda Architecture for Interactive Analytics , 2017, HICSS.

[66]  Yichuan Wang,et al.  Beyond a Technical Perspective: Understanding Big Data Capabilities in Health Care , 2015, 2015 48th Hawaii International Conference on System Sciences.

[67]  Sachchidanand Singh,et al.  Big Data analytics , 2012 .

[68]  Alberto Abelló,et al.  Quarry: Digging Up the Gems of Your Data Treasury , 2015, EDBT.

[69]  Jennifer Widom,et al.  Database Systems: The Complete Book , 2001 .

[70]  C. L. Philip Chen,et al.  Data-intensive applications, challenges, techniques and technologies: A survey on Big Data , 2014, Inf. Sci..

[71]  Jorge Oliveira e Sá,et al.  Big Data in Cloud: A Data Architecture , 2015, WorldCIST.

[72]  Paul W. P. J. Grefen,et al.  A framework for analysis and design of software reference architectures , 2012, Inf. Softw. Technol..