Modeling Superscalar Processor Memory-Level Parallelism

This paper proposes an analytical model to predict Memory-Level Parallelism (MLP) in a superscalar processor. We profile the workload once and measure a set of distributions to characterize the workload’s inherent memory behavior. We subsequently generate a virtual instruction stream, over which we then process an abstract MLP model to predict MLP for a particular micro-architecture with a given ROB size, LLC size, MSHR size and stride-based prefetcher. Experimental evaluation reports an improvement in modeling error from 16.9 percent for previous work to 3.6 percent on average for the proposed model.

[1]  Yan Solihin,et al.  MeToo: Stochastic Modeling of Memory Traffic Timing Behavior , 2015, 2015 International Conference on Parallel Architecture and Compilation (PACT).

[2]  J.W.C. Fu,et al.  Stride Directed Prefetching In Scalar Processors , 1992, [1992] Proceedings the 25th Annual International Symposium on Microarchitecture MICRO 25.

[3]  Lieven Eeckhout,et al.  Chip Multiprocessor Design Space Exploration through Statistical Simulation , 2009, IEEE Transactions on Computers.

[4]  James E. Smith,et al.  A first-order superscalar processor model , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..

[5]  Lieven Eeckhout,et al.  Sniper: Exploring the level of abstraction for scalable and accurate parallel multi-core simulation , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[6]  Tor M. Aamodt,et al.  Hybrid analytical modeling of pending cache hits, data prefetching, and MSHRs , 2008, 2008 41st IEEE/ACM International Symposium on Microarchitecture.

[7]  Yale N. Patt,et al.  Predicting Performance Impact of DVFS for Realistic Memory Systems , 2012, 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture.

[8]  David Black-Schaffer,et al.  Micro-architecture independent analytical processor performance and power modeling , 2015, 2015 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).

[9]  David Eklov,et al.  StatStack: Efficient modeling of LRU caches , 2010, 2010 IEEE International Symposium on Performance Analysis of Systems & Software (ISPASS).

[10]  Brian Fahs,et al.  Microarchitecture optimizations for exploiting memory-level parallelism , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..