Analyzing Scientific Data Sharing Patterns for In-network Data Caching

The volume of data moving through a network increases with new scientific experiments and simulations. Network bandwidth requirements also increase proportionally to deliver data within a certain time frame. We observe that a significant portion of the popular dataset is transferred multiple times to different users as well as to the same user for various reasons. In-network data caching for the shared data has shown to reduce the redundant data transfers and consequently save network traffic volume. In addition, overall application performance is expected to improve with in-network caching because access to the locally cached data results in lower latency. This paper shows how much data was shared over the study period, how much network traffic volume was consequently saved, and how much the temporary in-network caching increased the scientific application performance. It also analyzes data access patterns in applications and the impacts of caching nodes on the regional data repository. From the results, we observed that the network bandwidth demand was reduced by nearly a factor of 3 over the study period.

[1]  Brian Bockelman,et al.  Using Xrootd to Federate Regional Storage , 2012 .

[2]  Eli Dart,et al.  Nuclear Physics Network Requirements Review Report , 2019 .

[3]  Ilyas Alper Karatepe,et al.  Big data caching for networking: moving from cloud to edge , 2016, IEEE Communications Magazine.

[4]  M Tadel,et al.  XRootd, disk-based, caching proxy for optimization of data access, data placement and data replication , 2014 .

[5]  Brian Bockelman,et al.  StashCache: A Distributed Caching Federation for the Open Science Grid , 2019, PEARC.

[6]  Alessandro Bassi,et al.  The Internet Backplane Protocol: A Study in Resource Sharing , 2002, 2nd IEEE/ACM International Symposium on Cluster Computing and the Grid (CCGRID'02).

[7]  Michal Wrzeszcz,et al.  Metadata Organization and Management for Globalization of Data Access with Onedata , 2015, PPAM.

[8]  Shawn Wilkinson,et al.  Storj A Peer-to-Peer Cloud Storage Network , 2014 .

[9]  George Pallis,et al.  Content Delivery Networks: Status and Trends , 2003, IEEE Internet Comput..

[10]  F. Würthwein,et al.  A federated Xrootd cache , 2018 .

[11]  양희영 2005 , 2005, Los 25 años de la OMC: Una retrospectiva fotográfica.

[12]  Ran Liu,et al.  Named Data Networking in Climate Research and HEP Applications , 2015 .

[13]  X. Espinal,et al.  The Quest to solve the HL-LHC data access puzzle , 2020, EPJ Web of Conferences.

[14]  Brian Bockelman,et al.  Data Access for LIGO on the OSG , 2017, PEARC.

[15]  Larry L. Peterson,et al.  Syndicate: democratizing cloud storage and caching through service composition , 2013, SoCC.

[16]  Fan Jiang,et al.  Cachalot: A network-aware, cooperative cache network for geo-distributed, data-intensive applications , 2018, NOMS 2018 - 2018 IEEE/IFIP Network Operations and Management Symposium.

[17]  Pramodita Sharma 2012 , 2013, Les 25 ans de l’OMC: Une rétrospective en photos.

[18]  Patrick Crowley,et al.  Named data networking , 2014, CCRV.

[19]  Kristie B. Hadden,et al.  2020 , 2020, Journal of Surgical Orthopaedic Advances.

[20]  Youngjae Kim,et al.  SciSpace: A scientific collaboration workspace for geo-distributed HPC data centers , 2019, Future Gener. Comput. Syst..

[21]  P. Elmer,et al.  XROOTD-A highly scalable architecture for data access , 2005 .

[22]  C. Martin 2015 , 2015, Les 25 ans de l’OMC: Une rétrospective en photos.

[23]  Brian Bockelman,et al.  Creating a content delivery network for general science on the internet backbone using XCaches , 2020, EPJ Web of Conferences.