Addressing big data issues in Scientific Data Infrastructure

Big Data are becoming a new technology focus both in science and in industry. This paper discusses the challenges that are imposed by Big Data on the modern and future Scientific Data Infrastructure (SDI). The paper discusses a nature and definition of Big Data that include such features as Volume, Velocity, Variety, Value and Veracity. The paper refers to different scientific communities to define requirements on data management, access control and security. The paper introduces the Scientific Data Lifecycle Management (SDLM) model that includes all the major stages and reflects specifics in data management in modern e-Science. The paper proposes the SDI generic architecture model that provides a basis for building interoperable data or project centric SDI using modern technologies and best practices. The paper explains how the proposed models SDLM and SDI can be naturally implemented using modern cloud based infrastructure services provisioning model and suggests the major infrastructure components for Big Data.

[1]  Cees T. A. M. de Laat,et al.  Special section: OptIPlanet - The OptIPuter global collaboratory , 2009, Future Gener. Comput. Syst..

[2]  L. Florio,et al.  Advancing technologies and federating communities: a study on authentication and authorisation platforms for scientific resources in Europe , 2012 .

[3]  Leon Gommans,et al.  Job-centric security model for open collaborative environment , 2005, Proceedings of the 2005 International Symposium on Collaborative Technologies and Systems, 2005..

[4]  Yuri Demchenko,et al.  Defining inter-cloud architecture for interoperability and integration , 2012, CloudCom 2012.

[5]  Cees T. A. M. de Laat,et al.  Authorisation infrastructure for on-demand network resource provisioning , 2008, 2008 9th IEEE/ACM International Conference on Grid Computing.

[6]  Leon Gommans,et al.  VO-based Dynamic Security Associations in Collaborative Grid Environment , 2006, International Symposium on Collaborative Technologies and Systems (CTS'06).

[7]  Robert L. Grossman,et al.  Teraflows over Gigabit WANs with UDT , 2005, Future Gener. Comput. Syst..

[8]  Philippe Bonnet,et al.  A Provenance-Based Infrastructure to Support the Life Cycle of Executable Papers , 2011, ICCS.

[9]  Cees T. A. M. de Laat,et al.  XACML Policy Profile for Multidomain Network Resource Provisioning and Supporting Authorisation Infrastructure , 2009, 2009 IEEE International Symposium on Policies for Distributed Systems and Networks.

[10]  Keith Gordon,et al.  What is Big Data , 2013 .

[11]  Cees T. A. M. de Laat,et al.  Addressing Big Data challenges for Scientific Data Infrastructure , 2012, 4th IEEE International Conference on Cloud Computing Technology and Science Proceedings.