EventDB: A Large-Scale Semi-structured Scientific Data Management System

During the process of scientific research, the amount of data collected from scientific experimental devices has reached hundreds of PB per year. So how to use these data efficiently to produce some scientific findings is a hot problem. There are many challenges in the use of these scientific big data, such as the storage, processing and sharing of the data. In this paper, we propose a data management system, EventDB, for scientific big data. EventDB provides data management function for massive semi-structured scientific data; In EventDB, we propose IndexDB to provide a faster data retrieval, cross-domain access to provide a better data sharing and operator libraries to provide higher performance data analysis. Our preliminary experiments show that our system has improved performance by more than 6 times in data retrieval.

[1]  Zha Li,et al.  Data Management Challenges and Event Index Technologies in High Energy Physics , 2017 .

[2]  Miron Livny,et al.  Condor: a distributed job scheduler , 2001 .

[3]  Dirk Düllmann Petabyte databases , 1999, SIGMOD '99.

[4]  Di Li,et al.  The Five-hundred-meter Aperture Spherical radio Telescope (FAST) project , 2011, 2015 International Topical Meeting on Microwave Photonics (MWP).

[5]  Javier Sánchez,et al.  Distributed Data Collection for the ATLAS EventIndex , 2015 .

[6]  Yi-Fang Wang,et al.  Chapter 2 The BES-III Detector and Offline Software , 2009 .

[7]  Beijiang Liu High performance computing activities in hadron spectroscopy at BESIII , 2014 .

[8]  M Girone,et al.  WLCG Operations and the First Prolonged LHC Run , 2011 .

[9]  李强,et al.  HBase-based Storage and Analysis Platform for High Energy Physics Data , 2015 .

[10]  J Cranshaw,et al.  Building a scalable event-level metadata service for ATLAS , 2008 .

[11]  F. Rademakers,et al.  ROOT — An object oriented data analysis framework , 1997 .

[12]  Ashish Gupta,et al.  The RAMCloud Storage System , 2015, ACM Trans. Comput. Syst..

[13]  Magdalena Balazinska,et al.  Astronomy in the Cloud: Using MapReduce for Image Co-Addition , 2010, ArXiv.

[14]  J. D. Ponz,et al.  The FITS image extension , 1994 .

[15]  Karine Zeitouni,et al.  AstroSpark: towards a distributed data server for big data in astronomy , 2016, SIGSPATIAL PhD Symposium.

[16]  Jacek Becla Improving Performance of Object Oriented Databases, BABAR Case Studies , 2000 .