A Push-Based Prefetching for Remote Caching RAM Grid

As an innovative grid computing technique for sharing the distributed memory resources in a high-speed widearea network, RAM Grid exploits the distributed computing nodes, and provides remote memory for the user nodes which are short of memory. The performance of RAM Grid is constrained with the expensive network communication cost. In order to hide the latency of remote memory access and improve the performance, the authors proposed the push-based prefetching to enable the memory providers to push the potential useful pages to the user nodes. For each provider, it employs sequential pattern mining techniques, which adapts to the characteristics of memory page access sequences, on locating useful memory pages for prefetching. They have verified the effectiveness of the proposed method through trace-driven simulations. DOI: 10.4018/jghpc.2009070801 IGI PUBLISHING This paper appears in the publication, International Journal of Grid and High Performance Computing,Volume 1, Issue 4 edited by Emmanuel Udoh and Frank Zhigang Wang © 2009, IGI Global 701 E. Ch colate Avenue, H rshey PA 17033-1240, USA Tel: 717/533-8845; Fax 717/533-8661; URL-http://www.igi-global.com ITJ 5396 2 International Journal of Grid and High Performance Computing, 1(4), 1-15, October-December 2009 Copyright © 2009, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited. heterogeneous nodes that are connected with a high-speed wide-area network. Using RAM Grid for caching will meet the characteristic of loosely coupled distributed computing environment, which emphasizes to provide “best effort” service, while does not guarantee the degree of performance improvement. In worst case, it is also acceptable that the performance does not raise (and does not drop), while in better network environment, it will gain much more benefits. Nowadays the campus or enterprise network is fast enough to meet the requirements of remote memory sharing, and the rapidly developing network technologies will make our approach more and more attracting. To facilitate later description, we classify the nodes in RAM Grid (Chu, et al., 2006) into five types. The user node is the consumer of remote memory, while the corresponding memory provider is called the busy node, which comes from the available node. A deputy node serves for one user node, and it acts as a broker and automatically searches available nodes for the user node. The intermediate node does not provide or consume any remote memory. It is ready to become a user node or available node. In order to study the potential performance improvement, we compare the overheads of data access for an 8KB block over local disk, NFS and RAM grid, which accesses remote memory through the wide area network with 2ms round-trip latency and 2MB bandwidth. From Table 1 we can observe that the caching mechanism in DRACO only reduces the overhead by 25%~30% compared to local disk or NFS access, and the major data access overhead in DRACO mainly comes from the network transmission cost (nearly 60%). Therefore, the performance of DRACO can obviously be more improved if we reduce or hide some of the transmission cost. Prefetching is an approach to hide the cost of low speed media among different levels of storage devices. In this article, we employ prefetching in DRACO in order to improve the performance. Differing from traditional I/O devices, in DRACO, the busy nodes, which provide remote memory for caching, often have extra CPU cycles. Therefore, the busy nodes can decide the prefetching policy and parameters by themselves, thus releasing the user nodes of DRACO, which are often dedicated to mass of computing tasks, from the process of prefetching. In contrast to traditional approaches, in which the prefetching data are decided by a rather simple algorithm in a user node, such a push-based prefetching scheme can be more effective.

[1]  Courtenay T. Vaughan,et al.  Application Performance on the Tri-Lab Linux Capacity Cluster - TLCC , 2010, Int. J. Distributed Syst. Technol..

[2]  N. Mustafee Grid Technology for Maximizing Collaborative Decision Management and Support : Advancing Effective Virtual Organizations , 2010 .

[3]  Yunhao Liu,et al.  A distributed paging RAM grid system for wide-area memory sharing , 2006, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium.

[4]  Carmelo Ragusa Business Grids, Infrastructuring the Future of ICT , 2010 .

[5]  Omer F. Rana,et al.  Business Models for On-line Social Networks: Challenges and Opportunities , 2010, Int. J. Virtual Communities Soc. Netw..

[6]  Michael Dahlin,et al.  Cooperative caching: using remote client memory to improve file system performance , 1994, OSDI '94.

[7]  Aïcha-Nabila Benharkat,et al.  Towards a More Scalable Schema Matching: A Novel Approach , 2010, Int. J. Distributed Syst. Technol..

[8]  Anna R. Karlin,et al.  Implementing global memory management in a workstation cluster , 1995, SOSP.

[9]  Y. Charlie Hu,et al.  Program-Counter-Based Pattern Classification in Buffer Caching , 2004, OSDI.

[10]  Heithem Abbes,et al.  Parallelization of Littlewood-Richardson Coefficients Computation and its Integration into the BonjourGrid Meta-Desktop Grid Middleware , 2011, Int. J. Grid High Perform. Comput..

[11]  Ramakrishnan Srikant,et al.  Mining sequential patterns , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[12]  Shih-Hao Hung,et al.  Migrating Android Applications to the Cloud , 2011, Int. J. Grid High Perform. Comput..

[13]  Alan Grigg,et al.  A Scalable Approach to Real-Time System Timing Applications , 2010 .

[14]  Edward Y. Chang,et al.  MEDIC: a memory and disk cache for multimedia clients , 1999, Proceedings IEEE International Conference on Multimedia Computing and Systems.

[15]  Xiaoning Ding,et al.  A Locality-Aware Cooperative Cache Management Protocol to Improve Network File System Performance , 2006, 26th IEEE International Conference on Distributed Computing Systems (ICDCS'06).

[16]  Jack Dongarra,et al.  Handbook of Research on Scalable Computing Technologies , 2009 .

[17]  Ke Xu,et al.  From Enabling to Ensuring Grid Workflows , 2009 .

[18]  John H. Hartman,et al.  Efficient cooperative caching using hints , 1996, OSDI '96.

[19]  Qiming Chen,et al.  PrefixSpan,: mining sequential patterns efficiently by prefix-projected pattern growth , 2001, Proceedings 17th International Conference on Data Engineering.

[20]  Antonio Liotta,et al.  Handbook of Research on P2P and Grid Systems for Service-oriented Computing: Models, Methodologies a , 2010 .

[21]  Sheng-De Wang,et al.  Scalable Index and Data Management for Unstructured Peer-to-Peer Networks , 2010 .

[22]  Ivan Janciak,et al.  Ontology-Based Construction of Grid Data Mining Workflows , 2008 .

[23]  Ghalem Belalem,et al.  Load Balancing to Increase the Consistency of Replicas in Data Grids , 2010, Int. J. Distributed Syst. Technol..

[24]  David J. Lilja,et al.  Data prefetch mechanisms , 2000, CSUR.

[25]  Alejandro P. Buchmann,et al.  Principles and Applications of Distributed Event-Based Systems , 2010, Principles and Applications of Distributed Event-Based Systems.

[26]  Cheng Wu,et al.  Fuzzy Allocation of Fine-Grained Compute Resources for Grid Data Streaming Applications , 2010, Int. J. Grid High Perform. Comput..

[27]  Ibrahim Matta,et al.  BRITE: an approach to universal topology generation , 2001, MASCOTS 2001, Proceedings Ninth International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems.

[28]  Ramakrishnan Srikant,et al.  Mining Sequential Patterns: Generalizations and Performance Improvements , 1996, EDBT.

[29]  Jim Griffioen,et al.  Reducing File System Latency using a Predictive Approach , 1994, USENIX Summer.

[30]  Ian Foster,et al.  The Grid 2 - Blueprint for a New Computing Infrastructure, Second Edition , 1998, The Grid 2, 2nd Edition.

[31]  Sang Lyul Min,et al.  Towards application/file-level characterization of block references: a case for fine-grained buffer management , 2000, SIGMETRICS '00.

[32]  Jinjun Chen,et al.  Quantitative Quality of Service for Grid Computing: Applications for Heterogeneity, Large-scale Distribution, and Dynamic Environments , 2009 .

[33]  Evangelos P. Markatos,et al.  The Network RamDisk: Using remote memory on heterogeneous NOWs , 1999, Cluster Computing.

[34]  Evangelos P. Markatos,et al.  Implementation of a Reliable Remote Memory Pager , 1996, USENIX ATC.

[35]  Ian T. Foster,et al.  Condor-G: A Computation Management Agent for Multi-Institutional Grids , 2004, Cluster Computing.

[36]  Maozhen Li,et al.  Modeling Scalable Grid Information Services with Colored Petri Nets , 2010, Int. J. Grid High Perform. Comput..

[37]  Yunhao Liu,et al.  Parallel network RAM: effectively utilizing global cluster memory for large data-intensive parallel programs , 2004 .

[38]  Minglu Li,et al.  The Cost-Based Resource Management in Combination with Qos For Grid Computing , 2009 .

[39]  Ian T. Foster,et al.  The Anatomy of the Grid: Enabling Scalable Virtual Organizations , 2001, Int. J. High Perform. Comput. Appl..

[40]  Andreas Rausch,et al.  Event-Based Realization of Dynamic Adaptive Systems , 2010, Principles and Applications of Distributed Event-Based Systems.

[41]  Belabbes Yagoubi,et al.  Dynamic Dependent Tasks Assignment for Grid Computing , 2011, Int. J. Grid High Perform. Comput..

[42]  K. G. Srinivasa,et al.  An Adaptive Scheduler Framework for Complex Workflow Jobs on Grid Systems , 2011, 2011 IEEE International Conference on High Performance Computing and Communications.