The Large Scale Data Facility: Data Intensive Computing for Scientific Experiments

The Large Scale Data Facility (LSDF) at the Karlsruhe Institute of Technology was started end of 2009 with the aim of supporting the growing requirements of data intensive experiments. In close cooperation with the involved scientific communities, the LSDF provides them not only with adequate storage space but with a directly attached analysis farm and -- more importantly -- with value added services for their big scientific data-sets. Analysis workflows are supported through the mixed Hadoop and Open Nebula Cloud environments directly attached to the storage, and enable the efficient processing of the experimental data. Metadata handling is a central part of the LSDF, where a metadata repository, community specific metadata schemes, graphical tools, and APIs were developed for accessing and efficiently organizing the stored data-sets.