Timing Models for Fast Embedded Software Performance Analysis

[1]  Wolfgang Rosenstiel,et al.  Combining instruction set simulation and WCET analysis for embedded software performance estimation , 2012, 7th IEEE International Symposium on Industrial Embedded Systems (SIES'12).

[2]  Todd M. Austin,et al.  SimpleScalar: An Infrastructure for Computer System Modeling , 2002, Computer.

[3]  Brad Calder,et al.  Automatically characterizing large scale program behavior , 2002, ASPLOS X.

[4]  Dan Negrut,et al.  An Overview of NVIDIA Tegra K1 Architecture , 2014 .

[5]  James E. Smith,et al.  Advanced Micro Devices , 2005 .

[6]  Douglas M. Hawkins,et al.  Characterizing and comparing prevailing simulation techniques , 2005, 11th International Symposium on High-Performance Computer Architecture.

[7]  Lifan Xu,et al.  Auto-tuning a high-level language targeted to GPU codes , 2012, 2012 Innovative Parallel Computing (InPar).

[8]  Wolfgang Rosenstiel,et al.  Trace-based context-sensitive timing simulation considering execution path variations , 2016, 2016 21st Asia and South Pacific Design Automation Conference (ASP-DAC).

[9]  Kevin Skadron,et al.  Rodinia: A benchmark suite for heterogeneous computing , 2009, 2009 IEEE International Symposium on Workload Characterization (IISWC).

[10]  Gilles Sassatelli,et al.  Accuracy evaluation of GEM5 simulator system , 2012, 7th International Workshop on Reconfigurable and Communication-Centric Systems-on-Chip (ReCoSoC).

[11]  Wolfgang Rosenstiel,et al.  Improving accuracy of source level timing simulation for GPUs using a probabilistic resource model , 2015, 2015 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS).

[12]  Andreas Herkersdorf,et al.  Context-aware compiled simulation of out-of-order processor behavior based on atomic traces , 2011, 2011 IEEE/IFIP 19th International Conference on VLSI and System-on-Chip.

[13]  Reinhard Wilhelm,et al.  Why AI + ILP Is Good for WCET, but MC Is Not, Nor ILP Alone , 2004, VMCAI.

[14]  Stefan Stattelmann Source-level performance estimation of compiler-optimized embedded software considering complex program transformations , 2013 .

[15]  Andreas Herkersdorf,et al.  A Method for Accurate High-Level Performance Evaluation of MPSoC Architectures Using Fine-Grained Generated Traces , 2010, ARCS.

[16]  Xianfeng Li,et al.  Modeling out-of-order processors for WCET analysis , 2006, Real-Time Systems.

[17]  Wolfgang Rosenstiel,et al.  Dominator homomorphism based code matching for source-level simulation of embedded software , 2011, 2011 Proceedings of the Ninth IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS).

[18]  Stijn Eyerman,et al.  An Evaluation of High-Level Mechanistic Core Models , 2014, ACM Trans. Archit. Code Optim..

[19]  Jürgen Becker,et al.  Software-in-the-Loop simulation of embedded control applications based on Virtual Platforms , 2015, 2015 25th International Conference on Field Programmable Logic and Applications (FPL).

[20]  Atsushi Ike,et al.  Fast cycle estimation methodology for instruction-level emulator , 2012, 2012 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[21]  Reinhard Wilhelm,et al.  Analysis of Loops , 1998, CC.

[22]  Somayeh Sardashti,et al.  The gem5 simulator , 2011, CARN.

[23]  David Black-Schaffer,et al.  Micro-architecture independent analytical processor performance and power modeling , 2015, 2015 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).

[24]  Patrick Cousot,et al.  Abstract interpretation: a unified lattice model for static analysis of programs by construction or approximation of fixpoints , 1977, POPL.

[25]  Ming-Chao Chiang,et al.  A QEMU and SystemC-Based Cycle-Accurate ISS for Performance Estimation on SoC Development , 2011, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[26]  Andreas Gerstlauer,et al.  Automated, retargetable back-annotation for host compiled performance and power modeling , 2013, 2013 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS).

[27]  Wolfgang Rosenstiel,et al.  Fast and accurate source-level simulation of software timing considering complex code optimizations , 2011, 2011 48th ACM/EDAC/IEEE Design Automation Conference (DAC).

[28]  Ronald G. Dreslinski,et al.  Sources of error in full-system simulation , 2014, 2014 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).

[29]  Henry Wong,et al.  Analyzing CUDA workloads using a detailed GPU simulator , 2009, 2009 IEEE International Symposium on Performance Analysis of Systems and Software.

[30]  Nicholas Nethercote,et al.  Valgrind: a framework for heavyweight dynamic binary instrumentation , 2007, PLDI '07.

[31]  Jürgen Teich,et al.  Hardware/Software Codesign: The Past, the Present, and Predicting the Future , 2012, Proceedings of the IEEE.

[32]  Wolfgang Rosenstiel,et al.  Source level performance simulation of GPU cores , 2015, 2015 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[33]  Henrik Theiling,et al.  Control flow graphs for real-time systems analysis: reconstruction from binary executables and usage in ILP-based path analysis , 2002 .

[34]  André Seznec,et al.  Break down GPU execution time with an analytical method , 2012, RAPIDO '12.

[35]  Tsuyoshi Isshiki,et al.  Trace-driven workload simulation method for Multiprocessor System-On-Chips , 2009, 2009 46th ACM/IEEE Design Automation Conference.

[36]  Andreas Herkersdorf,et al.  System-level software performance simulation considering out-of-order processor execution , 2012, 2012 International Symposium on System on Chip (SoC).

[37]  Brad Calder,et al.  Basic block distribution analysis to find periodic behavior and simulation points in applications , 2001, Proceedings 2001 International Conference on Parallel Architectures and Compilation Techniques.

[38]  Hsien-Hsin S. Lee,et al.  GPUMech: GPU Performance Modeling Technique Based on Interval Analysis , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.

[39]  Ricardo Reis,et al.  Instruction-driven timing CPU model for efficient embedded software development using OVP , 2013, 2013 IEEE 20th International Conference on Electronics, Circuits, and Systems (ICECS).

[40]  James E. Smith,et al.  A first-order superscalar processor model , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..

[41]  Arun Parakh,et al.  Performance Estimation of GPUs with Cache , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum.

[42]  Vikram S. Adve,et al.  LLVM: a compilation framework for lifelong program analysis & transformation , 2004, International Symposium on Code Generation and Optimization, 2004. CGO 2004..

[43]  Wolfgang Rosenstiel,et al.  Context-sensitive timing simulation of binary embedded software , 2014, 2014 International Conference on Compilers, Architecture and Synthesis for Embedded Systems (CASES).