LIVE data workspace: A flexible, dynamic and extensible platform for petascale applications

The data needs of current and future PetaScale applications have increased over the last half decade to the extent that appropriate data management has become a crucial requirement. This concerns not only the storage of data produced by the new class of PetaScale applications, but also the data exchanges needed for coupling applications with concurrent analysis, online data visualization for validation, and others. To address such dynamic code coupling, we introduce the concept of an extensible, dynamic, and flexible data workspace, termed LIVE. In contrast to the data exchanges programmed with MPI, MPI-IO, or grid software, LIVE focuses on data exchanges carried out without a priori knowledge of potential data requirements. Examples include exchanges required by ad-hoc or dynamically determined methods for data validation, for general data analysis tasks, or for data visualization. Run on an execution environment comprised of integrated dynamic discovery and on-line management services, LIVE is used to create a dasiadata workspacepsila for a working molecular dynamics code base utilized by mechanical and materials engineers at Georgia Tech, for multi-scale materials modeling. Measurements of both this applicationpsilas data workspace and of the basic primitives in the LIVE framework demonstrate that the environmentpsilas substantial flexibility has minimal impact on overall performance, and in fact, that it improves performance in a number of usage scenarios. In particular, for a visualization pipeline example derived from our collaborators, we see a slight improvement over a solution based on MPI-IO, and a further improvement of up to 5% by utilizing LIVEpsilas ability to overlap communication with user-specified computation.

[1]  Edward A. Lee,et al.  Scientific workflow management and the Kepler system , 2006, Concurr. Comput. Pract. Exp..

[2]  T. Tu,et al.  From Mesh Generation to Scientific Visualization: An End-to-End Approach to Parallel Supercomputing , 2006, ACM/IEEE SC 2006 Conference (SC'06).

[3]  Keming Zhang,et al.  SCIRun2: a CCA framework for high performance computing , 2004, Ninth International Workshop on High-Level Parallel Programming Models and Supportive Environments, 2004. Proceedings..

[4]  Rajkumar Buyya,et al.  A taxonomy of Data Grids for distributed data sharing, management, and processing , 2005, CSUR.

[5]  James Arthur Kohl,et al.  Data redistribution and remote method invocation for coupled components , 2006, J. Parallel Distributed Comput..

[6]  Karsten Schwan,et al.  XChange: coupling parallel applications in a dynamic environment , 2004, 2004 IEEE International Conference on Cluster Computing (IEEE Cat. No.04EX935).

[7]  James Arthur Kohl,et al.  Cumulvs: Interacting with High-Performance Scientific Simulations, for Visualization, Steering and Fault Tolerance , 2006, Int. J. High Perform. Comput. Appl..

[8]  James P. Ahrens,et al.  Ligature: Component Architecture for High Performance Applications , 2000, Int. J. High Perform. Comput. Appl..

[9]  Karsten Schwan,et al.  Service Augmentation for High End Interactive Data Services , 2005, 2005 IEEE International Conference on Cluster Computing.

[10]  Mario Cannataro,et al.  KNOWLEDGE GRID: High Performance Knowledge Discovery on the Grid , 2001, GRID.

[11]  Bertram Ludäscher,et al.  Scientific workflow management and the Kepler system: Research Articles , 2006 .

[12]  Rosziati Ibrahim,et al.  Formalization of Component Object Model (COM) - The COMEL Language , 1998, ECOOP Workshops.

[13]  Peter H. Beckman,et al.  Efficient coupling of parallel applications using PAWS , 1998, Proceedings. The Seventh International Symposium on High Performance Distributed Computing (Cat. No.98TB100244).

[14]  Steven G. Parker,et al.  Parallel Remote Method Invocation and M-by-N Data Redistribution , 2003 .

[15]  Karsten Schwan,et al.  Morphable messaging: efficient support for evolution in distributed applications , 2004, Proceedings of the Second International Workshop on Challenges of Large Applications in Distributed Environments, 2004. CLADE 2004..

[16]  Karsten Schwan,et al.  Efficient Wire Formats for High Performance Computing , 2000, ACM/IEEE SC 2000 Conference (SC'00).

[17]  Karsten Schwan,et al.  SmartPointers: Personalized Scientific Data Portals In Your Hand , 2002, ACM/IEEE SC 2002 Conference (SC'02).

[18]  Mario Cannataro,et al.  The knowledge grid , 2003, CACM.

[19]  Karsten Schwan,et al.  Event services for high performance computing , 2000, Proceedings the Ninth International Symposium on High-Performance Distributed Computing.

[20]  Leonid Oliker,et al.  Leading Computational Methods on Scalar and Vector HEC Platforms , 2005, ACM/IEEE SC 2005 Conference (SC'05).

[21]  Carl Kesselman,et al.  High-Performance Remote Access to Climate Simulation Data: A Challenge Problem for Data Grid Technologies , 2001, ACM/IEEE SC 2001 Conference (SC'01).

[22]  Rachid Guerraoui,et al.  The design of a CORBA group communication service , 1996, Proceedings 15th Symposium on Reliable Distributed Systems.

[23]  Yong Zhao,et al.  Grid middleware services for virtual data discovery, composition, and integration , 2004, MGC '04.

[24]  David R. O'Hallaron,et al.  Scalable systems software - From mesh generation to scientific visualization: an end-to-end approach to parallel supercomputing , 2006, SC.

[25]  Scott R. Kohn,et al.  Toward a Common Component Architecture for High-Performance Scientific Computing , 1999, HPDC.

[26]  Greg Eisenhauer Portable Self-Describing Binary Data Streams , 1994 .

[27]  Susan M. Mniszewski,et al.  PAWS: collective interactions and data transfers , 2001, Proceedings 10th IEEE International Symposium on High Performance Distributed Computing.

[28]  Karsten Schwan,et al.  Efficient end to end data exchange using configurable compression , 2004, 24th International Conference on Distributed Computing Systems, 2004. Proceedings..

[29]  Greg Eisenhauer,et al.  Fast heterogeneous binary data interchange , 2000, Proceedings 9th Heterogeneous Computing Workshop (HCW 2000) (Cat. No.PR00556).

[30]  Jason Lee,et al.  High-Performance Remote Access to Climate Simulation Data: A Challenge Problem for Data Grid Technologies , 2001, ACM/IEEE SC 2001 Conference (SC'01).