EPSim-C: A Parallel Epoch-Based Cycle-Accurate Microarchitecture Simulator Using Cloud Computing

Recently, computing platforms have been being configured on a large scale to satisfy the diverse requirements of emerging applications like big data and graph processing, neural network, speech recognition and so on. In these computing platforms, each computing node consists of a multicore, an accelerator, and a complex memory hierarchy, which are connected to other nodes using a variety of high-performance networks. Up to now, researchers have been using cycle-accurate simulators to evaluate the performance of computer systems in detail. However, the execution of the simulators, which models modern computing architecture for multi-core, multi-node, datacenter, memory hierarchy, new memory, and new interconnection, is too slow and infeasible; since the architecture has become more complex today, the complexity of the simulator is rapidly increasing. Therefore, it is seriously challenging to employ them in the research and development of next-generation computer systems. To solve this problem, we previously presented EPSim (Epoch-based Simulator), which defines epochs that can be run independently by dividing the simulation run into several sections and executes them in parallel on a multicore platform, resulting in only the limited simulation speedup. In this paper, to overcome the computing resource limitations on multi-core platforms, we propose a novel EPSim-C (EPSim on Cloud) simulator that extends EPSim and achieves higher performance using a cloud computing platform. EPSim-C is designed to perform the epoch-based executions in a massively parallel fashion by using MapReduce on Hadoop-based systems. According to our experiments, we have achieved a maximum speed of 87.0× and an average speed of 46.1× using 256 cores. As far as we know, EPSim-C is the only existing way to accelerate the cycle-accurate simulator on cloud platforms; thus, our significant performance enhancement allows researchers to model and research current and future cutting-edge computing platforms using real workloads.

[1]  Babak Falsafi,et al.  PROToFLEX: FPGA-accelerated Hybrid Functional Simulator , 2007, 2007 IEEE International Parallel and Distributed Processing Symposium.

[2]  Jian Zhou,et al.  Opass: Analysis and Optimization of Parallel Data Access on Distributed File Systems , 2015, 2015 IEEE International Parallel and Distributed Processing Symposium.

[3]  Haibo Chen,et al.  A Loosely-Coupled Full-System Multicore Simulation Framework , 2016, IEEE Transactions on Parallel and Distributed Systems.

[4]  Yilun Shang,et al.  Resilient Multiscale Coordination Control against Adversarial Nodes , 2018, Energies.

[5]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[6]  Minseong Kim,et al.  Exploiting Coarse-Grained Parallelism Using Cloud Computing in Massive Power Flow Computation , 2018, Energies.

[7]  Wei Zhou,et al.  Simulating Big Data Clusters for System Planning, Evaluation, and Optimization , 2014, 2014 43rd International Conference on Parallel Processing.

[8]  Thomas F. Wenisch,et al.  SimFlex: Statistical Sampling of Computer System Simulation , 2006, IEEE Micro.

[9]  Prashant J. Shenoy,et al.  A platform for scalable one-pass analytics using MapReduce , 2011, SIGMOD '11.

[10]  Alexandru Iosup,et al.  Performance Analysis of Cloud Computing Services for Many-Tasks Scientific Computing , 2011, IEEE Transactions on Parallel and Distributed Systems.

[11]  Scott Hazelhurst,et al.  Scientific computing using virtual high-performance computing: a case study using the Amazon elastic computing cloud , 2008, SAICSIT '08.

[12]  M. Kainaga,et al.  Analysis of SPEC benchmark programs , 1991, Proceedings Eighth TRON Project Symposium.

[13]  Gail-Joon Ahn,et al.  Security and Privacy Challenges in Cloud Computing Environments , 2010, IEEE Security & Privacy.

[14]  George Kurian,et al.  Graphite: A distributed parallel simulator for multicores , 2010, HPCA - 16 2010 The Sixteenth International Symposium on High-Performance Computer Architecture.

[15]  Somayeh Sardashti,et al.  The gem5 simulator , 2011, CARN.

[16]  Lieven Eeckhout,et al.  Sniper: Exploring the level of abstraction for scalable and accurate parallel multi-core simulation , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[17]  Jianlong Zhong,et al.  GPU-Accelerated Cloud Computing for Data-Intensive Applications , 2014, Cloud Computing for Data-Intensive Applications.

[18]  Youngsun Han,et al.  Epsim: A Scalable and Parallel Marssx86 Simulator With Exploiting Epoch-Based Execution , 2019, IEEE Access.

[19]  Pete Wyckoff,et al.  Hive - A Warehousing Solution Over a Map-Reduce Framework , 2009, Proc. VLDB Endow..

[20]  Stijn Eyerman,et al.  Interval simulation: Raising the level of abstraction in architectural simulation , 2010, HPCA - 16 2010 The Sixteenth International Symposium on High-Performance Computer Architecture.

[21]  Lixin Zhang,et al.  Mambo: a full system simulator for the PowerPC architecture , 2004, PERV.

[22]  Reetuparna Das,et al.  Parallel automata processor , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).

[23]  Babak Falsafi,et al.  ProtoFlex: Towards Scalable, Full-System Multiprocessor Simulations Using FPGAs , 2009, TRETS.

[24]  Soonhoi Ha,et al.  TQSIM: A fast cycle-approximate processor simulator based on QEMU , 2016, J. Syst. Archit..

[25]  Dam Sunwoo,et al.  FPGA-Accelerated Simulation Technologies (FAST): Fast, Full-System, Cycle-Accurate Simulators , 2007, MICRO.

[26]  Henri Casanova,et al.  Versatile, scalable, and accurate simulation of distributed applications and platforms , 2014, J. Parallel Distributed Comput..

[27]  Yen-Chen Liu,et al.  Knights Landing: Second-Generation Intel Xeon Phi Product , 2016, IEEE Micro.

[28]  Matt T. Yourst PTLsim: A Cycle Accurate Full System x86-64 Microarchitectural Simulator , 2007, 2007 IEEE International Symposium on Performance Analysis of Systems & Software.

[29]  Athanasios V. Vasilakos,et al.  Cloud computing in e-Science: research challenges and opportunities , 2014, The Journal of Supercomputing.

[30]  Jianwei Chen,et al.  SlackSim: a platform for parallel simulations of CMPs on CMPs , 2009, CARN.

[31]  Brad Calder,et al.  Automatically characterizing large scale program behavior , 2002, ASPLOS X.

[32]  Sangyeun Cho,et al.  Accurately modeling superscalar processor performance with reduced trace , 2013, J. Parallel Distributed Comput..

[33]  Chi-Yi Lin,et al.  A Load-Balancing Algorithm for Hadoop Distributed File System , 2015, 2015 18th International Conference on Network-Based Information Systems.

[34]  Carlo Curino,et al.  Apache Hadoop YARN: yet another resource negotiator , 2013, SoCC.

[35]  Yu Zhang,et al.  Parallelization of IBM mambo system simulator in functional modes , 2008, OPSR.

[36]  Minseong Kim,et al.  P-DRAMSim2: Exploiting thread-level parallelism in DRAMSim2 , 2017, IEICE Electron. Express.

[37]  Scott Devine,et al.  Using the SimOS machine simulator to study complex computer systems , 1997, TOMC.

[38]  Laxmikant V. Kalé,et al.  BigSim: a parallel simulator for performance prediction of extremely large parallel machines , 2004, 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings..

[39]  Thomas F. Wenisch,et al.  SMARTS: accelerating microarchitecture simulation via rigorous statistical sampling , 2003, ISCA '03.

[40]  John Shalf,et al.  Performance Analysis of High Performance Computing Applications on the Amazon Web Services Cloud , 2010, 2010 IEEE Second International Conference on Cloud Computing Technology and Science.