Building architectures for data‐intensive science using the ADAGE framework

One of the main activities in data‐intensive science is data analysis. Although there are many popular technologies that can assist scientists in various isolated aspects of data analysis, supporting analysis processes in holistic ways that promote system interoperability, integration and automation, as well as scientific reproducibility and efficient data handling, presents many challenges. A common solution to address these challenges is to find efficient ways of integrating various existing technologies together to meet the analysis needs of scientists (which is similar to the idea behind science gateways). We believe that this solution is essentially an exercise in software design; and in many situations, these challenges should be tackled from a software design perspective. Consequently, this paper reviews different architectural design approaches that can be used to address these challenges and proposes a service‐oriented framework called the Ad Hoc Data Grid Environment, which consists of an architectural pattern and its associated operational guidelines. The guidelines prescribe a number of activities based on an iterative decomposition approach to produce and evolve software architectures according to constantly changing user needs. The framework is demonstrated on a case study involving analysis processes required for conducting financial event studies. Copyright © 2014 John Wiley & Sons, Ltd.

[1]  Geoffrey C. Fox,et al.  The Open Grid Computing Environments collaboration: portlets and services for science gateways , 2007, Concurr. Comput. Pract. Exp..

[2]  Nancy Wilkins-Diehr,et al.  Special Issue: Science Gateways—Common Community Interfaces to Grid Resources , 2007, Concurr. Comput. Pract. Exp..

[3]  Ian T. Foster,et al.  Globus Toolkit Version 4: Software for Service-Oriented Systems , 2005, Journal of Computer Science and Technology.

[4]  Fethi A. Rabhi,et al.  ADAGE: a framework for supporting user-driven ad-hoc data analysis processes , 2012, Computing.

[5]  Cláudio T. Silva,et al.  End-to-End eScience: Integrating Workflow, Query, Visualization, and Provenance at an Ocean Observatory , 2008, 2008 IEEE Fourth International Conference on eScience.

[6]  Nancy Wilkins-Diehr,et al.  Special Issue: Science Gateways—Common Community Interfaces to Grid Resources: Editorials , 2007 .

[7]  Matthew R. Pocock,et al.  Taverna: a tool for the composition and enactment of bioinformatics workflows , 2004, Bioinform..

[8]  Ivar Jacobson,et al.  The Unified Software Development Process , 1999 .

[9]  Steven G. Parker,et al.  Component‐based, problem‐solving environments for large‐scale scientific computing , 2002, Concurr. Comput. Pract. Exp..

[10]  Fethi A. Rabhi,et al.  A Case Study in Using ADAGE for Compute-Intensive Financial Analysis Processes , 2012, FinanceCom.

[11]  P McFedries The coming data deluge [Technically Speaking] , 2011 .

[12]  Paul Watson,et al.  e‐Science Central for CARMEN: science as a service , 2010, Concurr. Comput. Pract. Exp..

[13]  Gregor von Laszewski,et al.  A Java commodity grid kit , 2001, Concurr. Comput. Pract. Exp..

[14]  Dennis Gannon,et al.  Workflows for e-Science, Scientific Workflows for Grids , 2014 .

[15]  Suresh Marru,et al.  The LEAD Portal: a TeraGrid gateway and application service architecture , 2007, Concurr. Comput. Pract. Exp..

[16]  L. Harris Trading and Exchanges: Market Microstructure for Practitioners , 2002 .

[17]  U. Becciani,et al.  VisIVOWeb: A WWW Environment for Large-Scale Astrophysical Visualization , 2011, 1107.3053.

[18]  Jason Maassen,et al.  Programming Scientific and Distributed Workflow with Triana Services , 2004 .

[19]  John J. Binder The Event Study Methodology Since 1969 , 1997 .

[20]  Chris Morris,et al.  Developing Software for a Scientific Community: Some Challenges and Solutions , 2011 .

[21]  Ruey S. Tsay,et al.  Analysis of Financial Time Series: Tsay/Analysis of Financial Time Series , 2005 .

[22]  E. Gallopoulos,et al.  Problem-solving Environments For Computational Science , 1997, IEEE Computational Science and Engineering.

[23]  Stephen D. Miller,et al.  Orbiter Commander: A flexible application framework for service-based scientific computing environments , 2010, 2010 Gateway Computing Environments Workshop (GCE).

[24]  Judith Segal,et al.  Developing Scientific Software , 2008, IEEE Software.

[25]  Yan Liu,et al.  SimpleGrid toolkit: Enabling geosciences gateways to cyberinfrastructure , 2009, Comput. Geosci..

[26]  Anthony J. G. Hey,et al.  The Fourth Paradigm: Data-Intensive Scientific Discovery [Point of View] , 2011 .

[27]  Christopher R. Johnson,et al.  The SCIRun Computational Steering Software System , 1997, SciTools.

[28]  Fethi A. Rabhi,et al.  Enterprise Applications and Services in the Finance Industry , 2008, Lecture Notes in Business Information Processing.

[29]  P. Mykland,et al.  Jumps in Financial Markets: A New Nonparametric Test and Jump Dynamics , 2008 .

[30]  Layne T. Watson,et al.  WBCSim: an environment for modeling wood-based composites manufacture , 2006, Engineering with Computers.

[31]  Xin Li,et al.  A Grid‐enabled problem‐solving environment for advanced reservoir uncertainty analysis , 2008, Concurr. Comput. Pract. Exp..

[32]  Miklós Kozlovszky,et al.  WS-PGRADE/gUSE Generic DCI Gateway Framework for a Large Variety of User Communities , 2012, Journal of Grid Computing.

[33]  Tony Hey,et al.  The Fourth Paradigm: Data-Intensive Scientific Discovery , 2009 .

[34]  Dennis Kundisch,et al.  Towards Automated Event Studies Using High Frequency News and Trading Data , 2012, FinanceCom.