Processing HDF5 Datasets on Multi-core Architectures

In order to make scientific middleware and applications more scalable, there is a need to design them in such a way that they can utilize the evolving multi-core processor architectures available in grid and cloud computing environments. In this paper, we analyze various processing and scheduling techniques on multi-core architectures based on scientific data characteristics and access patterns. More specifically, we conduct fine-grained analysis of scientific datasets such as HDF5 to make effective processing and scheduling decisions in multi-threaded programming. We present performance analysis on how processing threads can be scheduled on multi-core nodes to enhance the performance of scientific applications that process HDF5 data. To accomplish this we introduce a dynamic marking scheme to keep track of the progress of threads on each core. This can be used to help determine work allocation, which results in a decrease in overall application execution time.

[1]  Gregor von Laszewski,et al.  A Java commodity grid kit , 2001, Concurr. Comput. Pract. Exp..

[2]  Steven Tuecke,et al.  The Anatomy of the Grid , 2003 .

[3]  William E. Allcock,et al.  The Globus Striped GridFTP Framework and Server , 2005, ACM/IEEE SC 2005 Conference (SC'05).

[4]  Marianne Winslett,et al.  An efficient abstract interface for multidimensional array I/O , 1994, Proceedings of Supercomputing '94.

[5]  Jia Wang,et al.  A survey of web caching schemes for the Internet , 1999, CCRV.

[6]  Richard W. Watson,et al.  The parallel I/O architecture of the high-performance storage system (HPSS) , 1995, Proceedings of IEEE 14th Symposium on Mass Storage Systems.

[7]  David Abramson,et al.  Economic models for management of resources in peer-to-peer and grid computing , 2001, SPIE ITCom.

[8]  Kavitha Ranganathan,et al.  Identifying Dynamic Replication Strategies for a High-Performance Data Grid , 2001, GRID.

[9]  David J. DeWitt,et al.  Scientific data management in the coming decade , 2005, SGMD.

[10]  Michael Stonebraker,et al.  Efficient organization of large multidimensional arrays , 1994, Proceedings of 1994 IEEE 10th International Conference on Data Engineering.

[11]  Marc Levoy,et al.  Display of surfaces from volume data , 1988, IEEE Computer Graphics and Applications.

[12]  Ken Kennedy,et al.  An Implementation of Interprocedural Bounded Regular Section Analysis , 1991, IEEE Trans. Parallel Distributed Syst..

[13]  John Shalf,et al.  Characterizing and predicting the I/O performance of HPC applications using a parameterized synthetic benchmark , 2008, HiPC 2008.

[14]  G. Allen,et al.  The Cactus Code: a problem solving environment for the grid , 2000, Proceedings the Ninth International Symposium on High-Performance Distributed Computing.

[15]  Philip J. Rhodes,et al.  Iteration aware prefetching for remote data access , 2005, First International Conference on e-Science and Grid Computing (e-Science'05).

[16]  Floriano Zini,et al.  Evaluation of an economy-based file replication strategy for a data grid , 2003, CCGrid 2003. 3rd IEEE/ACM International Symposium on Cluster Computing and the Grid, 2003. Proceedings..

[17]  Florian Schintke,et al.  Remote partial file access using compact pattern descriptions , 2004, IEEE International Symposium on Cluster Computing and the Grid, 2004. CCGrid 2004..

[18]  Mario Cannataro,et al.  The knowledge grid , 2003, CACM.

[19]  Ian T. Foster,et al.  Globus: a Metacomputing Infrastructure Toolkit , 1997, Int. J. High Perform. Comput. Appl..

[20]  T. Howes,et al.  LDAP: programming directory-enabled applications with lightweight directory access protocol , 1997 .

[21]  Alan Sussman,et al.  Improving access to multi-dimensional self-describing scientific datasets , 2003, CCGrid 2003. 3rd IEEE/ACM International Symposium on Cluster Computing and the Grid, 2003. Proceedings..

[22]  Madhusudhan Govindaraju,et al.  Cache Performance Optimization for Processing XML-Based Application Data on Multi-core Processors , 2010, 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing.

[23]  Ian T. Foster,et al.  The data grid: Towards an architecture for the distributed management and analysis of large scientific datasets , 2000, J. Netw. Comput. Appl..