An autonomic framework for enhancing the quality of data grid services

Data grid services have been used to deal with the increasing needs of applications in terms of data volume and throughput. The large scale, heterogeneity and dynamism of grid environments often make management and tuning of these data services very complex. Furthermore, current high-performance I/O approaches are characterized by their high complexity and specific features that usually require specialized administrator skills. Autonomic computing can help manage this complexity. The present paper describes an autonomic subsystem intended to provide self-management features aimed at efficiently reducing the I/O problem in a grid environment, thereby enhancing the quality of service (QoS) of data access and storage services in the grid. Our proposal takes into account that data produced in an I/O system is not usually immediately required. Therefore, performance improvements are related not only to current but also to any future I/O access, as the actual data access usually occurs later on. Nevertheless, the exact time of the next I/O operations is unknown. Thus, our approach proposes a long-term prediction designed to forecast the future workload of grid components. This enables the autonomic subsystem to determine the optimal data placement to improve both current and future I/O operations.

[1]  Ian T. Foster,et al.  Replica selection in the Globus Data Grid , 2001, Proceedings First IEEE/ACM International Symposium on Cluster Computing and the Grid.

[2]  María S. Pérez-Hernández,et al.  Finding order in chaos: a behavior model of the whole grid , 2010, Concurr. Comput. Pract. Exp..

[3]  Peter Z. Kunszt,et al.  File-based replica management , 2005, Future Gener. Comput. Syst..

[4]  Chao Jin,et al.  RepStore: a self-managing and self-tuning storage backend with smart bricks , 2004, International Conference on Autonomic Computing, 2004. Proceedings..

[5]  Xiao Qin,et al.  Design and analysis of a load balancing strategy in Data Grids , 2007, Future Gener. Comput. Syst..

[6]  David J. Ketchen,et al.  THE APPLICATION OF CLUSTER ANALYSIS IN STRATEGIC MANAGEMENT RESEARCH: AN ANALYSIS AND CRITIQUE , 1996 .

[7]  Kostas Magoutis OASIS : Self-tuning Storage for Applications , 2006 .

[8]  Christos Faloutsos,et al.  Storage device performance prediction with CART models , 2004, The IEEE Computer Society's 12th Annual International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunications Systems, 2004. (MASCOTS 2004). Proceedings..

[9]  Jean-Loup Baer,et al.  Proceedings of the 39th Annual International Symposium on Computer Architecture , 1983, International Symposium on Computer Architecture.

[10]  Reagan Moore,et al.  The SDSC storage resource broker , 2010, CASCON.

[11]  André Hardy,et al.  An examination of procedures for determining the number of clusters in a data set , 1994 .

[12]  Rajeev Thakur,et al.  Achievements and challenges for I/O in computational science , 2005 .

[13]  John A. Hartigan,et al.  Clustering Algorithms , 1975 .

[14]  David A. Patterson,et al.  Maximizing performance in a striped disk array , 1990, ISCA '90.

[15]  Dimosthenis Kyriazis,et al.  Dynamic QoS-aware data replication in grid environments based on data "importance" , 2012, Future Gener. Comput. Syst..

[16]  Milton Halem,et al.  A Mass Storage System Administrator Autonomic Assistant , 2005, Second International Conference on Autonomic Computing (ICAC'05).

[17]  Kavitha Ranganathan,et al.  Design and Evaluation of Dynamic Replication Strategies for a High-Performance Data Grid , 2001 .

[18]  Toni Cortes,et al.  Autonomic Storage System Based on Automatic Learning , 2004, HiPC.

[19]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[20]  Jun Feng,et al.  Eliminating replica selection - using multiple replicas to accelerate data transfer on grids , 2004, Proceedings. Tenth International Conference on Parallel and Distributed Systems, 2004. ICPADS 2004..

[21]  Christos Faloutsos,et al.  Storage Device Performance Prediction with CART Models (Extended Abstract) , 2004 .

[22]  Ronald L. Rivest,et al.  Introduction to Algorithms, Second Edition , 2001 .

[23]  Christos Faloutsos,et al.  Storage device performance prediction with CART models , 2004, The IEEE Computer Society's 12th Annual International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunications Systems, 2004. (MASCOTS 2004). Proceedings..

[24]  Flavia Donno,et al.  Replica Consistency in a Data Grid , 2004 .

[25]  E. Anderson HPL – SSP – 2001 – 4 : Simple table-based modeling of storage devices , 2001 .

[26]  Jesús Montes,et al.  A high performance suite of data services for grids , 2010, Future Gener. Comput. Syst..

[27]  María S. Pérez-Hernández,et al.  A new formalism for dynamic reconfiguration of data servers in a cluster , 2005, J. Parallel Distributed Comput..

[28]  Jeffrey S. Chase,et al.  Correlating Instrumentation Data to System States: A Building Block for Automated Diagnosis and Control , 2004, OSDI.

[29]  Ian T. Foster,et al.  Grid information services for distributed resource sharing , 2001, Proceedings 10th IEEE International Symposium on High Performance Distributed Computing.

[30]  Flavia Donno,et al.  Replica Management in the European DataGrid Project , 2004, Journal of Grid Computing.

[31]  Brian Everitt,et al.  Cluster analysis , 1974 .

[32]  T. Mexia,et al.  Author ' s personal copy , 2009 .

[33]  Jennifer M. Schopf,et al.  Using Regression Techniques to Predict Large Data Transfers , 2003, Int. J. High Perform. Comput. Appl..

[34]  Ian Foster,et al.  Monitoring and Discovery in a Web Services Framework: Functionality and Performance of Globus Toolkit MDS4 , 2006 .

[35]  Nesa L'abbe Wu,et al.  Linear programming and extensions , 1981 .

[36]  Jeffrey O. Kephart,et al.  The Vision of Autonomic Computing , 2003, Computer.

[37]  Toni Cortes,et al.  Towards a zero-knowledge model for disk drives , 2003, 2003 Autonomic Computing Workshop.

[38]  Satoshi Matsuoka,et al.  Grid Datafarm Architecture for Petascale Data Intensive Computing , 2002, 2nd IEEE/ACM International Symposium on Cluster Computing and the Grid (CCGRID'02).

[39]  S. Frankel Convergence rates of iterative treatments of partial differential equations , 1950 .

[40]  Maarten Litmaath The Storage Resource Manager Interface Specification Version 2.2 , 2013 .

[41]  Ronald L. Rivest,et al.  Introduction to Algorithms , 1990 .

[42]  Ian T. Foster,et al.  The data grid: Towards an architecture for the distributed management and analysis of large scientific datasets , 2000, J. Netw. Comput. Appl..

[43]  Gregory R. Ganger,et al.  Observer: keeping system models from becoming obsolete , 2007 .