File Systems and Access Technologies for the Large Scale Data Facility

Research projects produce huge amounts of data, which have to be stored and analyzed immediately after the acquisition. Storing and analyzing of high data rates are normally not possible within the detectors and can be worse if several detectors with similar data rates are used within a project. In order to store the data for analysis, it has to be transferred on an appropriate infrastructure, where it is accessible at any time and from different clients. The Large Scale Data Facility (LSDF), which is currently developed at KIT, is designed to fulfill the requirements of data intensive scientific experiments or applications. Currently, the LSDF consists of a testbed installation for evaluating different technologies. From a user point of view, the LSDF is a huge data sink, providing in the initial state 6 PB of storage, and will be accessible via a couple of interfaces. As a user is not interested in learning dozens of APIs for accessing data a generic API, the ADALAPI, has been designed, providing unique interfaces for the transparent access to the LSDF over different technologies. The present contribution evaluates technologies useable for the development of the LSDF to meet the requirements of various scientific projects. Also, the ADALAPI and the first GUI based on it are introduced.

[1]  Tatu Ylönen,et al.  The Secure Shell (SSH) Authentication Protocol , 2006, RFC.

[2]  Reagan Moore,et al.  The SDSC storage resource broker , 2010, CASCON.

[3]  J. Postel,et al.  File transfer protocol (FTP) , 1985 .

[4]  Giuseppe Lo Presti,et al.  CASTOR: A Distributed Storage Resource Facility for High Performance Data Processing at CERN , 2007, 24th IEEE Conference on Mass Storage Systems and Technologies (MSST 2007).

[5]  Arie Shoshani,et al.  Storage Resource Managers , 2004 .

[6]  William I. Nowicki,et al.  NFS: Network File System Protocol specification , 1989, RFC.

[7]  David E. Smith,et al.  Grid-Enabled Standards-based Data Management , 2007, 24th IEEE Conference on Mass Storage Systems and Technologies (MSST 2007).

[8]  Christian Bauer,et al.  Java Persistence with Hibernate , 2006 .

[9]  Robert B. Ross,et al.  PVFS: A Parallel File System for Linux Clusters , 2000, Annual Linux Showcase & Conference.

[10]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[11]  Ian T. Foster Globus Toolkit Version 4: Software for Service-Oriented Systems , 2005, NPC.

[12]  Gregor von Laszewski,et al.  A Java commodity grid kit , 2001, Concurr. Comput. Pract. Exp..

[13]  Frank B. Schmuck,et al.  GPFS: A Shared-Disk File System for Large Computing Clusters , 2002, FAST.

[14]  William E. Allcock,et al.  The Globus Striped GridFTP Framework and Server , 2005, ACM/IEEE SC 2005 Conference (SC'05).

[15]  Mahadev Satyanarayanan,et al.  Scale and performance in a distributed file system , 1988, TOCS.

[16]  Marc Horowitz,et al.  FTP Security Extensions , 1997, RFC.

[17]  Satoshi Matsuoka,et al.  Grid Datafarm Architecture for Petascale Data Intensive Computing , 2002, 2nd IEEE/ACM International Symposium on Cluster Computing and the Grid (CCGRID'02).

[18]  Flavia Donno,et al.  StoRM : grid middleware for disk resource management , 2005 .