Matchmaking, datasets and physics analysis

Grid enabled physics analysis requires a workload management system (WMS) that takes care of finding suitable computing resources to execute data intensive jobs. A typical example is the WMS available in the LCG2 (also referred to as EGEE-0) software system, used by several scientific experiments. Like many other current grid systems, LCG2 provides a file level granularity for accessing and analysing data. However, application scientists such as high energy physicists often require a higher abstraction level for accessing data, i.e. they prefer to use datasets rather than files in their physics analysis. We have improved the current WMS (in particular the Matchmaker) to allow physicists to express their analysis job requirements in terms of datasets. This required modifications to the WMS and its interface to potential data catalogues. As a result, we propose a simple data location interface that is based on a Web service approach and allows for interoperability of the WMS with new dataset and file catalogues. We took a particular high energy physics experiment as the source for our study and show that physics analysis can be improved by our modifications to the current grid system.

[1]  Miron Livny,et al.  Stork: making data placement a first class citizen in the grid , 2004, 24th International Conference on Distributed Computing Systems, 2004. Proceedings..

[2]  Predrag Buncic,et al.  The AliEn system, status and perspectives , 2003, ArXiv.

[3]  Brian Tierney,et al.  File and Object Replication in Data Grids , 2001, Proceedings 10th IEEE International Symposium on High Performance Distributed Computing.

[4]  Veronique Lefebure,et al.  RefDB: The Reference database for CMS Monte Carlo production , 2003 .

[5]  David Abramson,et al.  A Computational Economy for Grid Computing and its Implementation in the Nimrod-G Resource Brok , 2001, Future Gener. Comput. Syst..

[6]  Yolanda Gil,et al.  Pegasus: Mapping Scientific Workflows onto the Grid , 2004, European Across Grids Conference.

[7]  S. Lacaprara,et al.  Use of grid tools to support CMS distributed analysis , 2004, IEEE Symposium Conference Record Nuclear Science 2004..

[8]  Peter Z. Kunszt,et al.  Giggle: A Framework for Constructing Scalable Replica Location Services , 2002, ACM/IEEE SC 2002 Conference (SC'02).

[9]  C.Anglano,et al.  Integrating GRID Tools to Build a Computing Resource Broker:Activities of DataGrid WP1 , 2001 .