Determining Performance Boundaries and Automatic Loop Optimization of High-Level System Specifications
暂无分享,去创建一个
[1] Anoop Gupta,et al. Parallel computer architecture - a hardware / software approach , 1998 .
[2] Saturnino Garcia,et al. Kremlin: like gprof, but for parallelization , 2011, PPoPP '11.
[3] Jason Cong,et al. High-Level Synthesis for FPGAs: From Prototyping to Deployment , 2011, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.
[4] Ana Balevic,et al. Exploiting multi-level parallelism in streaming applications for heterogeneous platforms with GPUs , 2013 .
[5] Edward A. Lee,et al. Dataflow process networks , 1995, Proc. IEEE.
[6] Edward D. Lazowska,et al. Speedup Versus Efficiency in Parallel Systems , 1989, IEEE Trans. Computers.
[7] Saturnino Garcia,et al. Parkour: Parallel Speedup Estimates for Serial Programs , 2011, HotPar.
[8] Susan L. Graham,et al. Gprof: A call graph execution profiler , 1982, SIGPLAN '82.
[9] Jerónimo Castrillón Mazo. Programming heterogeneous MPSoCs: tool flows to close the software productivity gap , 2013 .
[10] Koen De Bosschere,et al. A profile-based tool for finding pipeline parallelism in sequential programs , 2010, Parallel Comput..
[11] Michael J. Flynn,et al. Some Computer Organizations and Their Effectiveness , 1972, IEEE Transactions on Computers.
[12] Alexandru Turjan,et al. Translating affine nested-loop programs to process networks , 2004, CASES '04.
[13] Xingfu Wu,et al. Performance Evaluation, Prediction and Visualization of Parallel Systems , 1999, The Kluwer International Series on Asian Studies in Computer and Information Science.
[14] Sven Verdoolaege,et al. Polyhedral Process Networks , 2010, Handbook of Signal Processing Systems.
[15] Allen D. Malony,et al. ParaProf: A Portable, Extensible, and Scalable Tool for Parallel Performance Profile Analysis , 2003, Euro-Par.
[16] Roel Meeuws. Quantitative hardware prediction modeling for hardware/software co-design , 2012 .
[17] Manoj Kumar,et al. Measuring Parallelism in Computation-Intensive Scientific/Engineering Applications , 1988, IEEE Trans. Computers.
[18] Harish Patil,et al. Pin: building customized program analysis tools with dynamic instrumentation , 2005, PLDI '05.
[19] Ralph Duncan,et al. A survey of parallel computer architectures , 1990, Computer.
[20] Gilles Kahn,et al. Coroutines and Networks of Parallel Processes , 1977, IFIP Congress.
[21] Lei Gao,et al. TotalProf: a fast and accurate retargetable source code profiler , 2009, CODES+ISSS '09.
[22] Keshab K. Parhi,et al. VLSI digital signal processing systems , 1999 .
[23] Diomidis Spinellis,et al. Global Analysis and Transformations in Preprocessed Languages , 2003, IEEE Trans. Software Eng..
[24] Barton P. Miller,et al. Critical path analysis for the execution of parallel and distributed programs , 1988, [1988] Proceedings. The 8th International Conference on Distributed.
[25] Nicholas Nethercote,et al. Valgrind: a framework for heavyweight dynamic binary instrumentation , 2007, PLDI '07.
[26] Henk Corporaal,et al. Parallelization of while loops in nested loop programs for shared-memory multiprocessor systems , 2011, 2011 Design, Automation & Test in Europe.
[27] Sjoerd Meijer,et al. Transformations for polyhedral process networks , 2010 .
[28] Saturnino Garcia,et al. Kismet: parallel speedup estimates for serial programs , 2011, OOPSLA '11.
[29] J. Ramanujam,et al. Automatic C-to-CUDA Code Generation for Affine Programs , 2010, CC.
[30] van Haastregt,et al. Estimation and optimization of the performance of polyhedral process networks , 2013 .
[31] William G. Griswold,et al. The design of whole-program analysis tools , 1996, Proceedings of IEEE 18th International Conference on Software Engineering.
[32] Koen Bertels,et al. QUAD - A Memory Access Pattern Analyser , 2010, ARC.
[33] Ed F. Deprettere,et al. Compaan: deriving process networks from Matlab for embedded signal processing architectures , 2000, CODES '00.
[34] Cédric Bastoul,et al. Code generation in the polyhedral model is easier than you think , 2004, Proceedings. 13th International Conference on Parallel Architecture and Compilation Techniques, 2004. PACT 2004..
[35] John L. Hennessy,et al. The Future of Systems Research , 1999, Computer.
[36] Melanie Kambadur,et al. Harmony: Collection and analysis of parallel block vectors , 2012, 2012 39th Annual International Symposium on Computer Architecture (ISCA).
[37] Andrei Alexandrescu,et al. Modern C++ design: generic programming and design patterns applied , 2001 .
[38] Arjan J. C. van Gemund. Performance Modeling of Parallel Systems , 1996 .
[39] J. Larus. Whole program paths , 1999, PLDI '99.
[40] Björn Karlsson,et al. Beyond the C++ Standard Library: An Introduction to Boost , 2005 .
[41] Zhen Li,et al. Discovery of Potential Parallelism in Sequential Programs , 2013, 2013 42nd International Conference on Parallel Processing.
[42] K. Bertels,et al. Profile-guided application partitioning for heterogeneous reconfigurable platforms , 2012, The 16th CSI International Symposium on Computer Architecture and Digital Systems (CADS 2012).
[43] Saturnino Garcia,et al. Kremlin: rethinking and rebooting gprof for the multicore age , 2011, PLDI '11.
[44] David A. Patterson,et al. Computer Architecture: A Quantitative Approach , 1969 .
[45] A. Chahar,et al. Compile time aanalysis for hardware transactional memory architectures , 2012 .
[46] Ed F. Deprettere,et al. Daedalus: Toward composable multimedia MP-SoC design , 2008, 2008 45th ACM/IEEE Design Automation Conference.
[47] Giovanni De Micheli,et al. Synthesis and Optimization of Digital Circuits , 1994 .
[48] Todor Stefanov,et al. Translating affine nested-loop programs with dynamic loop bounds into Polyhedral Process Networks , 2010, 2010 8th IEEE Workshop on Embedded Systems for Real-Time Multimedia.