Experimental analysis of operating system jitter caused by page reclaim

Operating system jitter is one of the causes of runtime overhead in high-performance computing applications. Many high-performance computing applications perform burst accesses to I/O, and such accesses consume a large amount of memory. When the Linux kernel runs out of memory, it awakens special kernel threads to reclaim memory pages. If the kernel threads are frequently awakened, application performance is degraded because of the threads’ resource consumption as well as the increase in the application’s page faults and migration between CPU cores. In this study, we empirically analyze the impact of jitter caused by reclaiming memory pages, and we propose a method for reducing it. The proposed method reclaims memory pages in advance of the kernel thread. It reclaims more pages at one time than the kernel threads, thus reducing the frequency of page reclaim and the impact of jitter. We conducted experiments using practical weather forecast software, the results of which showed that the proposed method minimized performance degradation caused by jitter.

[1]  Don E Maxwell,et al.  Reducing Application Runtime Variability on Jaguar XT5 , 2010 .

[2]  Hairong Kuang,et al.  The Hadoop Distributed File System , 2010, 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST).

[3]  Andrew J. Hutton,et al.  Lustre: Building a File System for 1,000-node Clusters , 2003 .

[4]  Dan Tsafrir,et al.  System noise, OS clock ticks, and fine-grained parallel applications , 2005, ICS '05.

[5]  Yoonho Park,et al.  FusedOS: Fusing LWK Performance with FWK Functionality in a Heterogeneous Environment , 2012, 2012 IEEE 24th International Symposium on Computer Architecture and High Performance Computing.

[6]  Emiliano Betti,et al.  A global operating system for HPC clusters , 2009, 2009 IEEE International Conference on Cluster Computing and Workshops.

[7]  Mingyu Chen,et al.  GenerOS: An asymmetric operating system kernel for multi-core systems , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).

[8]  Lorie M. Liebrock,et al.  Stepping towards noiseless Linux environment , 2012, ROSS '12.

[9]  Florin Isaila,et al.  Collective I/O Tuning Using Analytical and Machine Learning Models , 2015, 2015 IEEE International Conference on Cluster Computing.

[10]  Liana L. Fong,et al.  Characterization of System Services and Their Performance Impact in Multi-core Nodes , 2011, 2011 IEEE International Parallel & Distributed Processing Symposium.

[11]  Pradipta De,et al.  Handling OS jitter on multicore multithreaded systems , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.

[12]  Ron Brightwell,et al.  Characterizing application sensitivity to OS interference using kernel-level noise injection , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.

[13]  Mark Giampapa,et al.  Experiences with a Lightweight Supercomputer Kernel: Lessons Learned from Blue Gene's CNK , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.

[14]  Ravi Kothari,et al.  Identifying sources of Operating System Jitter through fine-grained kernel instrumentation , 2007, 2007 IEEE International Conference on Cluster Computing.

[15]  Terry Jones Linux kernel co-scheduling for bulk synchronous parallel applications , 2011, ROSS '11.

[16]  T. H. Dunigan,et al.  Early experiences and performance of the Intel Paragon , 1994 .

[17]  Dave Chinner,et al.  Exploring High Bandwidth Filesystems on Large Systems , 2006 .

[18]  Osamu Tatebe,et al.  Reduction of operating system jitter caused by page reclaim , 2014, ROSS@ICS.

[19]  Francisco J. Cazorla,et al.  A Quantitative Analysis of OS Noise , 2011, 2011 IEEE International Parallel & Distributed Processing Symposium.

[20]  Allen D. Malony,et al.  The ghost in the machine: observing the effects of kernel operation on parallel application performance , 2007, Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07).

[21]  Rivalino Matias,et al.  Exploratory Study on the Linux OS Jitter , 2012, 2012 Brazilian Symposium on Computing System Engineering.

[22]  Osamu Tatebe,et al.  Gfarm Grid File System , 2010, New Generation Computing.

[23]  R. Gioiosa,et al.  Analysis of system overhead on parallel computers , 2004, Proceedings of the Fourth IEEE International Symposium on Signal Processing and Information Technology, 2004..

[24]  Torsten Hoefler,et al.  Characterizing the Influence of System Noise on Large-Scale Applications by Simulation , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.

[25]  Susan Coghlan,et al.  The Influence of Operating Systems on the Performance of Collective Operations at Extreme Scale , 2006, 2006 IEEE International Conference on Cluster Computing.

[26]  Satoshi Matsuoka,et al.  How file access patterns influence interference among cluster applications , 2014, 2014 IEEE International Conference on Cluster Computing (CLUSTER).