Deep scientific computing requires deep data

Increasingly, scientific advances require the fusion of large amounts of complex data with extraordinary amounts of computational power. The problems of deep science demand deep computing and deep storage resources. In addition to teraflop-range computing engines with their own local storage, facilities must provide large data repositories of the order of 10-100 petabytes, and networking to allow the movement of multi-terabyte files in a timely and secure manner. This paper examines such problems and identifies associated challenges. The paper discusses some of the storage systems and data management methods that are needed for computing facilities to address the challenges and describes some ongoing improvements.

[1]  Christos Karamanolis,et al.  DiFFS: a Scalable Distributed File System , 2001 .

[2]  Arie Shoshani,et al.  Storage resource managers: Middleware components for gridstorage , 2005 .

[3]  W. M. Wood-Vasey,et al.  The nearby supernova factory , 2004, astro-ph/0401513.

[4]  John Shalf,et al.  Ieee Computer Graphics and Applications Numerical Relativity Grid-distributed Visualizations Using Connectionless Protocols Graphics Applications for Grid Computing , 2022 .

[5]  Gregory F. Butler,et al.  The global unified parallel file system (GUPFS) project: FY 2002 activities and results , 2003 .

[6]  Don Middleton Earth System Grid II, Turning Climate Datasets into Community Resources , 2001 .

[7]  Sally Floyd,et al.  HighSpeed TCP for Large Congestion Windows , 2003, RFC.

[8]  John Shalf,et al.  Cactus and Visapult: An ultra-high performance grid-distributed visualization architecture using connectionless protocols , 2002 .

[9]  Vern Paxson,et al.  Bro: a system for detecting network intruders in real-time , 1998, Comput. Networks.

[10]  Brian Tierney,et al.  System capability effects on algorithms for network bandwidth measurement , 2003, IMC '03.

[11]  E. Seidel,et al.  Gauge conditions for long-term numerical black hole evolutions without excision , 2002, gr-qc/0206072.

[12]  Arie Shoshani,et al.  Storage resource managers: essential components for the Grid , 2003 .

[13]  Julian Satran,et al.  Internet Small Computer Systems Interface (iSCSI) , 2004, RFC.

[14]  Brian Tierney,et al.  A TCP Tuning Daemon , 2002, ACM/IEEE SC 2002 Conference (SC'02).

[15]  William E. Johnston,et al.  Creating science-driven computer architecture: A new path to scientific leadership , 2002 .

[16]  Brian Tierney,et al.  On-demand grid application tuning and debugging with the NetLogger activation service , 2003, Proceedings. First Latin American Web Congress.

[17]  Aric D. Blumer,et al.  The Parallel Virtual File System , 1994 .

[18]  Mary A. Scott,et al.  High Performance Networks for High Impact Science , 2003 .

[19]  Fernando Paganini,et al.  Fast kernel: Background theory and experimental results , 2003 .

[20]  Brian Tierney,et al.  An infrastructure for passive network monitoring of application data streams , 2003 .