ADAGE: a framework for supporting user-driven ad-hoc data analysis processes

Data analysis is an important part of the scientific process carried out by domain experts in data-intensive science. Despite the availability of several software tools and systems, their use in combination with each other for conducting complex types of analyses is a very difficult task for non-IT experts. The main contribution of this paper is to introduce an open architectural framework based on service-oriented computing (SOC) principles called the Ad-hoc DAta Grid Environment (ADAGE) framework that can be used to guide the development of domain-specific problem-solving environments or systems to support data analysis activities. Through an application of the ADAGE framework and a prototype implementation that supports the analysis of financial news and market data, this paper demonstrates that systems developed based on the framework allow users to effectively express common analysis processes. This paper also outlines some limitations as well as avenues for future research.

[1]  P McFedries The coming data deluge [Technically Speaking] , 2011 .

[2]  Francis G. McCabe,et al.  Reference Model for Service Oriented Architecture 1.0 , 2006 .

[3]  Miron Livny,et al.  Condor-a hunter of idle workstations , 1988, [1988] Proceedings. The 8th International Conference on Distributed.

[4]  Michael L. Brodie On the Development of Data Models , 1982, On Conceptual Modelling.

[5]  Daniel C. Stanzione,et al.  Building problem-solving environments with the Arches framework , 2009, J. Syst. Softw..

[6]  B. J. Ferro Castro,et al.  Pattern-Oriented Software Architecture: A System of Patterns , 2009 .

[7]  Yike Guo,et al.  Design of Problem-Solving Environment for Contingent Claim Valuation , 2001, Euro-Par.

[8]  Frank J. Fabozzi,et al.  Long-Range Dependence, Fractal Processes, and Intra-Daily Data , 2008 .

[9]  Ilia Petrov,et al.  Aspects of Data-Intensive Cloud Computing , 2010, From Active Data Management to Event-Based Systems and More.

[10]  Ilia Petrov,et al.  From Active Data Management to Event-Based Systems and More , 2010, Lecture Notes in Computer Science.

[11]  Sudha Ram,et al.  Information systems interoperability: What lies beneath? , 2004, TOIS.

[12]  Mike P. Papazoglou,et al.  Web Services - Principles and Technology , 2007 .

[13]  Steven Tuecke,et al.  The Open Grid Services Architecture , 2004, The Grid 2, 2nd Edition.

[14]  Magdalena Balazinska,et al.  Biology and data-intensive scientific discovery in the beginning of the 21st century. , 2011, Omics : a journal of integrative biology.

[15]  Detlef Seese,et al.  Handbook on Information Technology in Finance , 2008 .

[16]  C. Goodhart,et al.  High frequency data in financial markets: Issues and applications , 1997 .

[17]  David Frankel,et al.  Model Driven Architecture: Applying MDA to Enterprise Computing , 2003 .

[18]  Svein G. Johnsen,et al.  The ATHENA Interoperability Framework , 2007, IESA.

[19]  Fethi A. Rabhi,et al.  A data model for processing financial market and news data , 2009 .

[20]  Michael L. Brodie On conceptual modelling - perspectives from artificial intelligence, databases and programming languages , 1984, Topics in information systems.

[21]  Marian Bubak,et al.  Perspectives on grid computing , 2010, Future Gener. Comput. Syst..

[22]  R. Gencay,et al.  An Introduc-tion to High-Frequency Finance , 2001 .

[23]  Douglas Thain,et al.  Biocompute 2.0: an improved collaborative workspace for data intensive bio‐science , 2011, Concurr. Comput. Pract. Exp..

[24]  Anthony J. G. Hey,et al.  The Fourth Paradigm: Data-Intensive Scientific Discovery [Point of View] , 2011 .

[25]  Eugene Kolker,et al.  Policy and data-intensive scientific discovery in the beginning of the 21st century. , 2011, Omics : a journal of integrative biology.

[26]  Fabio Casati,et al.  Web services interoperability specifications , 2006, Computer.

[27]  Jan M. Zytkow,et al.  Knowledge discovery in databases: the purpose, necessity, and challenges , 2002 .

[28]  Ian T. Foster,et al.  The Anatomy of the Grid: Enabling Scalable Virtual Organizations , 2001, Int. J. High Perform. Comput. Appl..

[29]  Gail Corbitt,et al.  Service Oriented Architecture: Challenges for Business and Academia , 2008, Proceedings of the 41st Annual Hawaii International Conference on System Sciences (HICSS 2008).

[30]  Tony Hey,et al.  The Fourth Paradigm: Data-Intensive Scientific Discovery , 2009 .

[31]  Tim Bollerslev,et al.  Risk, Jumps, and Diversification , 2007 .

[32]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[33]  John W. Tukey,et al.  Exploratory Data Analysis. , 1979 .

[34]  T. Andersen Some Reflections on Analysis of High-Frequency Data , 2000 .

[35]  Jan M. Zytkow,et al.  Handbook of Data Mining and Knowledge Discovery , 2002 .

[36]  Sanjay Bose,et al.  Service-Oriented Architecture Compass: Business Value, Planning, and Enterprise Roadmap , 2005 .

[37]  E. Gallopoulos,et al.  Problem-solving Environments For Computational Science , 1997, IEEE Computational Science and Engineering.

[38]  Gerhard Fischer,et al.  Metadesign: Guidelines for Supporting Domain Experts in Software Development , 2009, IEEE Software.

[39]  Lawrence Yao,et al.  Modelling Exploratory Analysis Processes for eResearch , 2010 .

[40]  Miroslaw Malek,et al.  Current solutions for Web service composition , 2004, IEEE Internet Computing.

[41]  Rizos Sakellariou,et al.  Euro-Par 2001 Parallel Processing , 2001, Lecture Notes in Computer Science.

[42]  Thomas H. Davenport,et al.  Process Innovation: Reengineering Work Through Information Technology , 1992 .

[43]  Alexander S. Szalay,et al.  Extreme Data-Intensive Scientific Computing , 2011, Computing in Science & Engineering.

[44]  Douglas Thain,et al.  Harnessing parallelism in multicore clusters with the All-Pairs, Wavefront, and Makeflow abstractions , 2010, Cluster Computing.

[45]  Guy Doumeingts,et al.  Enterprise Interoperability: New Challenges and Approaches , 2007 .

[46]  Ami Marowka,et al.  The GRID: Blueprint for a New Computing Infrastructure , 2000, Parallel Distributed Comput. Pract..

[47]  Scott R. Kohn,et al.  Toward a Common Component Architecture for High-Performance Scientific Computing , 1999, HPDC.