Peruse and Profit: Estimating the Accelerability of Loops
暂无分享,去创建一个
Vijayalakshmi Srinivasan | William N. Sumner | Snehasish Kumar | Arrvindh Shriraman | Amirali Sharifian | V. Srinivasan | Arrvindh Shriraman | Nick Sumner | A. Sharifian | Snehasish Kumar
[1] Lei Zhang,et al. A General-Purpose Many-Accelerator Architecture Based on Dataflow Graph Clustering of Applications , 2014, Journal of Computer Science and Technology.
[2] Graham R. Nudd,et al. Pace—A Toolset for the Performance Prediction of Parallel and Distributed Systems , 2000, Int. J. High Perform. Comput. Appl..
[3] J. David Morgenthaler,et al. Evaluating static analysis defect warnings on production software , 2007, PASTE '07.
[4] Brad Calder,et al. The Strong correlation Between Code Signatures and Performance , 2005, IEEE International Symposium on Performance Analysis of Systems and Software, 2005. ISPASS 2005..
[5] Ron Cytron,et al. Interprocedural dependence analysis and parallelization , 1986, SIGP.
[6] Dean M. Tullsen,et al. Data-triggered threads: Eliminating redundant computation , 2011, 2011 IEEE 17th International Symposium on High Performance Computer Architecture.
[7] Lieven Eeckhout,et al. Quantifying the Impact of Input Data Sets on Program Behavior and its Applications , 2003, J. Instr. Level Parallelism.
[8] Paul B. Schneck,et al. Automatic recognition of vector and parallel operations in a higher level language , 1972, SIGP.
[9] Lieven Eeckhout,et al. Performance prediction based on inherent program similarity , 2006, 2006 International Conference on Parallel Architectures and Compilation Techniques (PACT).
[10] Christoforos E. Kozyrakis,et al. Understanding sources of inefficiency in general-purpose chips , 2010, ISCA.
[11] P. Sadayappan,et al. Dynamic trace-based analysis of vectorization potential of applications , 2012, PLDI.
[12] Rudolf Eigenmann,et al. Idiom recognition in the Polaris parallelizing compiler , 1995, ICS '95.
[13] Sally A. McKee,et al. An Approach to Performance Prediction for Parallel Applications , 2005, Euro-Par.
[14] Pang-Ning Tan,et al. Receiver Operating Characteristic , 2009, Encyclopedia of Database Systems.
[15] Corinna Cortes,et al. Support-Vector Networks , 1995, Machine Learning.
[16] Mark N. Wegman,et al. Constant propagation with conditional branches , 1985, POPL.
[17] Ryan N. Rakvic,et al. The Fuzzy Correlation between Code and Performance Predictability , 2004, 37th International Symposium on Microarchitecture (MICRO-37'04).
[18] Michael Hicks,et al. LOCKSMITH: Practical static race detection for C , 2011, TOPL.
[19] Ian H. Witten,et al. The WEKA data mining software: an update , 2009, SKDD.
[20] Somesh Jha,et al. Static analysis and compiler design for idempotent processing , 2012, PLDI.
[21] Rajeev Barua,et al. AESOP : The Autoparallelizing Compiler for Shared Memory Computers , 2013 .
[22] David M. Brooks,et al. ISA-independent workload characterization and its implications for specialized architectures , 2013, 2013 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).
[23] Jiayuan Meng,et al. Improving GPU Performance Prediction with Data Transfer Modeling , 2013, 2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum.
[24] Stephen A. Edwards,et al. Computation vs. memory systems: pinning down accelerator bottlenecks , 2010, ISCA'10.
[25] Erik R. Altman,et al. Predicting GPU Performance from CPU Runs Using Machine Learning , 2014, 2014 IEEE 26th International Symposium on Computer Architecture and High Performance Computing.
[26] Karthikeyan Sankaralingam,et al. DySER: Unifying Functionality and Parallelism Specialization for Energy-Efficient Computing , 2012, IEEE Micro.
[27] D. Kibler,et al. Instance-based learning algorithms , 2004, Machine Learning.
[28] Elie Shaccour. ELI-C : A Loop-level Workload Characterization Tool , 2014 .
[29] Yoav Freund,et al. The Alternating Decision Tree Learning Algorithm , 1999, ICML.
[30] James R. Larus,et al. Efficient path profiling , 1996, Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture. MICRO 29.
[31] Karthikeyan Sankaralingam,et al. Idempotent code generation: Implementation, analysis, and evaluation , 2013, Proceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization (CGO).
[32] Saturnino Garcia,et al. Kremlin: rethinking and rebooting gprof for the multicore age , 2011, PLDI '11.
[33] Scott B. Baden,et al. Modeling and predicting application performance on hardware accelerators , 2011, 2011 IEEE International Symposium on Workload Characterization (IISWC).
[34] Brad Calder,et al. Automatically characterizing large scale program behavior , 2002, ASPLOS X.
[35] Pradeep Dubey,et al. PALM: Parallel Architecture-Friendly Latch-Free Modifications to B+ Trees on Many-Core Processors , 2011, Proc. VLDB Endow..
[36] Geoff Holmes,et al. Multiclass Alternating Decision Trees , 2002, ECML.
[37] Gu-Yeon Wei,et al. Aladdin: A pre-RTL, power-performance accelerator simulator enabling large design space exploration of customized architectures , 2014, 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA).
[38] Lizy Kurian John,et al. A Performance Counter Based Workload Characterization on Blue Gene/P , 2008, 2008 37th International Conference on Parallel Processing.
[39] William Pugh,et al. Uniform techniques for loop optimization , 1991, ICS '91.
[40] Lieven Eeckhout,et al. Microarchitecture-Independent Workload Characterization , 2007, IEEE Micro.
[41] Scott A. Mahlke,et al. SAGE: Self-tuning approximation for graphics engines , 2013, 2013 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[42] Kenneth A. Ross,et al. Navigating big data with high-throughput, energy-efficient data partitioning , 2013, ISCA.
[43] Robert E. Schapire,et al. A Brief Introduction to Boosting , 1999, IJCAI.
[44] Toshio Nakatani,et al. A new idiom recognition framework for exploiting hardware-assist instructions , 2006, ASPLOS XII.
[45] Thomas Fahringer. Automatic Performance Prediction of Parallel Programs , 1996, Springer US.
[46] Luis Ceze,et al. Neural Acceleration for General-Purpose Approximate Programs , 2014, IEEE Micro.
[47] Antonia Zhai,et al. Exploring speculative parallelism in SPEC2006 , 2009, 2009 IEEE International Symposium on Performance Analysis of Systems and Software.
[48] Laura Carrington,et al. PIR: PMaC's Idiom Recognizer , 2010, 2010 39th International Conference on Parallel Processing Workshops.
[49] Collin McCurdy,et al. The Scalable Heterogeneous Computing (SHOC) benchmark suite , 2010, GPGPU-3.
[50] T. K. Prakash,et al. Performance Characterization of SPEC CPU 2006 Benchmarks on Intel Core 2 Duo Processor , .
[51] Saturnino Garcia,et al. Kismet: parallel speedup estimates for serial programs , 2011, OOPSLA '11.
[52] Babak Falsafi,et al. Meet the walkers accelerating index traversals for in-memory databases , 2013, 2013 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[53] Ken Kennedy,et al. Practical dependence testing , 1991, PLDI '91.
[54] Karthikeyan Sankaralingam,et al. iGPU: Exception support and speculative execution on GPUs , 2012, 2012 39th Annual International Symposium on Computer Architecture (ISCA).
[55] David A. Padua,et al. Dependence graphs and compiler optimizations , 1981, POPL '81.
[56] Daniel Cordes,et al. A Fast and Precise Static Loop Analysis Based on Abstract Interpretation, Program Slicing and Polytope Models , 2009, 2009 International Symposium on Code Generation and Optimization.
[57] Xiaojin Zhu,et al. Cross-architecture performance prediction (XAPP) using CPU code to predict GPU performance , 2015, 2015 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).