On the exploitation of loop-level parallelism in embedded applications
暂无分享,去创建一个
Milind Girkar | Xinmin Tian | Alexander V. Veidenbaum | Alexandru Nicolau | Hideki Saito | Arun Kejariwal | A. Nicolau | A. Veidenbaum | Xinmin Tian | M. Girkar | A. Kejariwal | Hideki Saito
[1] Joshua S. Auerbach,et al. Concert/C: A Language for Distributed Programming , 1994, USENIX Winter.
[2] Richard S. Bird,et al. Notes on recursion elimination , 1977, CACM.
[3] Yat Sang Kwong. On reductions and livelocks in asynchronous parallel computation , 1982 .
[4] Yanhong A. Liu,et al. From recursion to iteration: what are the optimizations? , 1999, PEPM '00.
[5] Philip J. Hatcher,et al. Data-Parallel Programming on MIMD Computers , 1991, IEEE Trans. Parallel Distributed Syst..
[6] David L. Kuck,et al. The Structure of Computers and Computations , 1978 .
[7] Utpal Banerjee,et al. Loop Transformations for Restructuring Compilers: The Foundations , 1993, Springer US.
[8] Aart J. C. Bik. The Software Vectorization Handbook: Apply-ing Multimedia Extensions for Maximum Performance , 2004 .
[9] Lance M. Berc,et al. Continuous profiling: where have all the cycles gone? , 1997, ACM Trans. Comput. Syst..
[10] Lance M. Berc,et al. Continuous profiling: where have all the cycles gone? , 1997, TOCS.
[11] Narain H. Gehani,et al. The concurrent C programming language , 1989 .
[12] Karandeep Singh,et al. LMPI: MPI for heterogeneous embedded distributed systems , 2006, 12th International Conference on Parallel and Distributed Systems - (ICPADS'06).
[13] Jack Dongarra,et al. MPI: The Complete Reference , 1996 .
[14] Ahmed Amine Jerraya,et al. Automatic generation and targeting of application specific operating systems and embedded systems software , 2001, Proceedings Design, Automation and Test in Europe. Conference and Exhibition 2001.
[15] Ken Kennedy,et al. Conversion of control dependence to data dependence , 1983, POPL '83.
[16] Milind Girkar,et al. On the performance potential of different types of speculative thread-level parallelism: The DL version of this paper includes corrections that were not made available in the printed proceedings , 2006, ICS '06.
[17] Dake Liu,et al. Network Processor for , 2003 .
[18] Paolo Faraboschi,et al. Embedded Computing: A VLIW Approach to Architecture, Compilers and Tools , 2004 .
[19] Richard Gerber. The Software Optimization Cookbook , 2002 .
[20] Arthur J. Bernstein,et al. Analysis of Programs for Parallel Processing , 1966, IEEE Trans. Electron. Comput..
[21] B. Ramakrishna Rau,et al. Some scheduling techniques and an easily schedulable horizontal architecture for high performance scientific computing , 1981, MICRO 14.
[22] Constantine D. Polychronopoulos. Loop Coalesing: A Compiler Transformation for Parallel Machines , 1987, ICPP.
[23] Edward A. Lee. The problem with threads , 2006, Computer.
[24] Sarita V. Adve,et al. Shared Memory Consistency Models: A Tutorial , 1996, Computer.
[25] Scott A. Mahlke,et al. Effective compiler support for predicated execution using the hyperblock , 1992, MICRO 25.
[26] Wayne H. Wolf,et al. The future of multiprocessor systems-on-chips , 2004, Proceedings. 41st Design Automation Conference, 2004..
[27] Jonathan Schaeffer,et al. From patterns to frameworks to parallel programs , 2002, Parallel Comput..
[28] Wayne H. Wolf,et al. Multiprocessor Systems-on-Chips , 2004, ISVLSI.
[29] Andrew S. Grimshaw. An Introduction to Parallel Object-Oriented Programming with Mentat , 1991 .
[30] Maurice Herlihy,et al. Transactional Memory: Architectural Support For Lock-free Data Structures , 1993, Proceedings of the 20th Annual International Symposium on Computer Architecture.
[31] Peng Zhao,et al. An integrated simdization framework using virtual vectors , 2005, ICS '05.
[32] Alice C. Parker,et al. SOS: Synthesis of application-specific heterogeneous multiprocessor systems , 2001, J. Parallel Distributed Comput..
[33] Scott A. Mahlke,et al. The superblock: An effective technique for VLIW and superscalar compilation , 1993, The Journal of Supercomputing.
[34] Jaime H. Moreno,et al. A high-performance embedded DSP core with novel SIMD features , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..
[35] Dean M. Tullsen,et al. Simultaneous multithreading: Maximizing on-chip parallelism , 1995, Proceedings 22nd Annual International Symposium on Computer Architecture.
[36] Ahmed Amine Jerraya,et al. Automatic generation and targeting of application specific operating systems and embedded systems software , 2001, DATE '01.
[37] Tulika Mitra,et al. Dynamic vectorization: a mechanism for exploiting far-flung ILP in ordinary programs , 1999, ISCA.
[38] Ayal Zaks,et al. Auto-vectorization of interleaved data for SIMD , 2006, PLDI '06.
[39] John Paul Shen,et al. Helper threads via virtual multithreading , 2004, IEEE Micro.
[40] Barbara M. Chapman,et al. Supercompilers for parallel and vector computers , 1990, ACM Press frontier series.
[41] Anoop Gupta,et al. COOL: An object-based language for parallel programming , 1994, Computer.
[42] H. P. E. Vranken,et al. TriMedia CPU 64 Architecture , .
[43] Amer Baghdadi,et al. Automatic generation of application-specific architectures for heterogeneous multiprocessor system-on-chip , 2001, Proceedings of the 38th Design Automation Conference (IEEE Cat. No.01CH37232).
[44] G. H. Barnes,et al. A controllable MIMD architecture , 1986 .
[45] Gang Ren,et al. Optimizing data permutations for SIMD devices , 2006, PLDI '06.
[46] H. Peter Hofstee,et al. Introduction to the Cell multiprocessor , 2005, IBM J. Res. Dev..
[47] Sri Parameswaran,et al. Application-specific heterogeneous multiprocessor synthesis using differential-evolution , 1998, Proceedings. 11th International Symposium on System Synthesis (Cat. No.98EX210).
[48] Rudolf Eigenmann,et al. Automatic program parallelization , 1993, Proc. IEEE.
[49] Ralph Johnson,et al. design patterns elements of reusable object oriented software , 2019 .
[50] Scott Mahlke,et al. Effective compiler support for predicated execution using the hyperblock , 1992, MICRO 1992.
[51] Joseph A. Fisher,et al. Trace Scheduling: A Technique for Global Microcode Compaction , 1981, IEEE Transactions on Computers.
[52] James R. Larus,et al. Software and the Concurrency Revolution , 2005, ACM Queue.
[53] Gurindar S. Sohi,et al. The Expandable Split Window Paradigm for Exploiting Fine-grain Parallelism , 1992, [1992] Proceedings the 19th Annual International Symposium on Computer Architecture.
[54] James R. Larus,et al. Branch prediction for free , 1993, PLDI '93.
[55] L. C. Smith. PASSION Runtime Library for Parallel I/O , 1994 .
[56] Rajiv Gupta,et al. Region Scheduling: An Approach for Detecting and Redistributing Parallelism , 1990, IEEE Trans. Software Eng..
[57] Srivaths Ravi,et al. Synthesis of application-specific heterogeneous multiprocessor architectures using extensible processors , 2005, 18th International Conference on VLSI Design held jointly with 4th International Conference on Embedded Systems Design.
[58] R. M. Tomasulo,et al. An efficient algorithm for exploiting multiple arithmetic units , 1995 .
[59] Steven S. Muchnick,et al. Advanced Compiler Design and Implementation , 1997 .
[60] Xin-Min Tian,et al. Intel OpenMP C++/Fortran Compiler for Hyper-Threading Technology: Implementation and Performance , 2002 .
[61] Milind Girkar,et al. Challenges in exploitation of loop parallelism in embedded applications , 2006, Proceedings of the 4th International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS '06).
[62] K. Mani Chandy,et al. CC++: A Declarative Concurrent Object Oriented Programming Notation , 1993 .
[63] Aart J. C. Bik. Software Vectorization Handbook, The: Applying Intel Multimedia Extensions for Maximum Performance , 2004 .
[64] Ralph E. Johnson,et al. Components, frameworks, patterns , 1997, SSR '97.
[65] Andy D. Pimentel,et al. TriMedia CPU64 architecture , 1999, Proceedings 1999 IEEE International Conference on Computer Design: VLSI in Computers and Processors (Cat. No.99CB37040).
[66] B. Ramakrishna Rau,et al. Region-based compilation: Introduction, motivation, and initial experience , 2007, International Journal of Parallel Programming.
[67] Hermann Kopetz,et al. Real-time systems , 2018, CSC '73.
[68] Gang Ren,et al. An empirical study on the vectorization of multimedia applications for multimedia extensions , 2005, 19th IEEE International Parallel and Distributed Processing Symposium.
[69] Franz Franchetti,et al. Short vector code generation for the discrete Fourier transform , 2003, Proceedings International Parallel and Distributed Processing Symposium.
[70] Dennis Gannon,et al. Distributed pC++ Basic Ideas for an Object Parallel Language , 1993, Sci. Program..
[71] B. Ramakrishna Rau,et al. Instruction-level parallel processing: History, overview, and perspective , 2005, The Journal of Supercomputing.
[72] G. Amdhal,et al. Validity of the single processor approach to achieving large scale computing capabilities , 1967, AFIPS '67 (Spring).
[73] Mario Nemirovsky,et al. DISC: dynamic instruction stream computer , 1991, MICRO 24.
[74] Gurindar S. Sohi,et al. The expandable split window paradigm for exploiting fine-grain parallelsim , 1992, ISCA '92.
[75] Robert D. Blumofe,et al. Hood: A user-level threads library for multiprogrammed multiprocessors , 1998 .
[76] James R. Larus,et al. Loop-Level Parallelism in Numeric and Symbolic Programs , 1993, IEEE Trans. Parallel Distributed Syst..
[77] Chris Ding,et al. ZioLib: A parallel I/O library , 2003 .
[78] A. Skjellum,et al. eMPI/eMPICH: embedding MPI , 1996, Proceedings. Second MPI Developer's Conference.
[79] Timothy G. Mattson,et al. A Pattern Language for Parallel Application Programs (Research Note) , 2000, Euro-Par.
[80] Richard E. Hank,et al. Region-based compilation: an introduction and motivation , 1995, MICRO 1995.
[81] Michael J. Flynn,et al. Very high-speed computing systems , 1966 .