SPIRAL: Extreme Performance Portability
暂无分享,去创建一个
Tze Meng Low | Franz Franchetti | José M. F. Moura | James C. Hoe | Jeremy R. Johnson | Daniele G. Spampinato | Markus Püschel | Doru Thom Popovici | Richard M. Veras | Jeremy R. Johnson | J. Hoe | F. Franchetti | R. Veras | Markus Püschel | Doru-Thom Popovici
[1] Rudolf Eigenmann,et al. PEAK—a fast and effective performance tuning system via compiler optimization orchestration , 2008, TOPL.
[2] Franz Franchetti,et al. Computer Generation of Hardware for Linear Digital Signal Processing Transforms , 2012, TODE.
[3] P. Sadayappan,et al. Annotation-based empirical performance tuning using Orio , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.
[4] Tinkara Toš,et al. Graph Algorithms in the Language of Linear Algebra , 2012, Software, environments, tools.
[5] Daisuke Takahashi,et al. The HPC Challenge (HPCC) benchmark suite , 2006, SC.
[6] Manuela M. Veloso,et al. Learning to Construct Fast Signal Processing Implementations , 2002, J. Mach. Learn. Res..
[7] Franz Franchetti,et al. Computer generation of fast fourier transforms for the cell broadband engine , 2009, ICS '09.
[8] Chris-Kriton Skylaris,et al. Introducing ONETEP: linear-scaling density functional simulations on parallel computers. , 2005, The Journal of chemical physics.
[9] Serge Winitzki,et al. YACAS: A Do-It-Yourself Symbolic Algebra Environment , 2002, AISC.
[10] Daniele G. Spampinato,et al. A basic linear algebra compiler for structured matrices , 2016, 2016 IEEE/ACM International Symposium on Code Generation and Optimization (CGO).
[11] Jan Vitek,et al. Terra: a multi-stage language for high-performance computing , 2013, PLDI.
[12] Krzysztof Czarnecki,et al. DSL Implementation in MetaOCaml, Template Haskell, and C++ , 2003, Domain-Specific Program Generation.
[13] Paolo Bientinesi,et al. Knowledge-Based Automatic Generation of Partitioned Matrix Expressions , 2011, CASC.
[14] Franz Franchetti,et al. A Rewriting System for the Vectorization of Signal Transforms , 2006, VECPAR.
[15] David E. Bernholdt,et al. Synthesis of High-Performance Parallel Programs for a Class of ab Initio Quantum Chemistry Models , 2005, Proceedings of the IEEE.
[16] Franz Franchetti,et al. HAMLeT: Hardware accelerated memory layout transform within 3D-stacked DRAM , 2014, 2014 IEEE High Performance Extreme Computing Conference (HPEC).
[17] Martin Odersky,et al. Lightweight modular staging: a pragmatic approach to runtime code generation and compiled DSLs , 2010, GPCE '10.
[18] Daisuke Takahashi,et al. Japanese Autotuning Research: Autotuning Languages and FFT , 2018, Proceedings of the IEEE.
[19] José M. F. Moura,et al. Fast automatic software implementations of FIR filters , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..
[20] Francky Catthoor,et al. Polyhedral parallel code generation for CUDA , 2013, TACO.
[21] James C. Hoe,et al. Automatic generation of streaming datapaths for arbitrary fixed permutations , 2009, 2009 Design, Automation & Test in Europe Conference & Exhibition.
[22] Tze Meng Low,et al. FFTX and SpectralPack: A First Look , 2018, 2018 IEEE 25th International Conference on High Performance Computing Workshops (HiPCW).
[23] Franz Franchetti,et al. Efficient Utilization of SIMD Extensions , 2005, Proceedings of the IEEE.
[24] Franz Franchetti,et al. Domain-specific library generation for parallel software and hardware platforms , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.
[25] Gang Ren,et al. A comparison of empirical and model-driven optimization , 2003, PLDI '03.
[26] Franz Franchetti,et al. Generating high performance pruned FFT implementations , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.
[27] Franz Franchetti,et al. Large-scale electronic structure calculations of high-Z metals on the BlueGene/L platform , 2006, SC.
[28] Franz Franchetti,et al. Automatic SIMD vectorization of fast fourier transforms for the larrabee and AVX instruction sets , 2011, ICS '11.
[29] Robert A. van de Geijn,et al. FLAME: Formal Linear Algebra Methods Environment , 2001, TOMS.
[30] Sriram Krishnamoorthy,et al. Parametric multi-level tiling of imperfectly nested loops , 2009, ICS.
[31] Shoaib Kamil,et al. The tensor algebra compiler , 2017, Proc. ACM Program. Lang..
[32] Basilio B. Fraguela,et al. Automatic Tuning of Discrete Fourier Transforms Driven by Analytical Modeling , 2009, 2009 18th International Conference on Parallel Architectures and Compilation Techniques.
[33] Tze Meng Low,et al. High Assurance Code Generation for Cyber-Physical Systems , 2017, 2017 IEEE 18th International Symposium on High Assurance Systems Engineering (HASE).
[34] Franz Franchetti,et al. Operator Language: A Program Generation Framework for Fast Kernels , 2009, DSL.
[35] Martin Odersky,et al. Spiral in scala: towards the systematic construction of generators for performance libraries , 2014, GPCE '13.
[36] Franz Franchetti,et al. Discrete fourier transform on multicore , 2009, IEEE Signal Processing Magazine.
[37] Robert A. van de Geijn,et al. The science of deriving dense linear algebra algorithms , 2005, TOMS.
[38] Doru-Thom Popovici,et al. Generating Optimized Fourier Interpolation Routines for Density Functional Theory Using SPIRAL , 2015, 2015 IEEE International Parallel and Distributed Processing Symposium.
[39] Roberto Erick Lopez-Herrejon,et al. Generating product-lines of product-families , 2002, Proceedings 17th IEEE International Conference on Automated Software Engineering,.
[40] Victor Eijkhout,et al. Self-Adapting Linear Algebra Algorithms and Software , 2005, Proceedings of the IEEE.
[41] R. C. Whaley,et al. Automatically Tuned Linear Algebra Software (ATLAS) , 2011, Encyclopedia of Parallel Computing.
[42] Charles E. Leiserson,et al. Cache-Oblivious Algorithms , 2003, CIAC.
[43] Van Nostrand,et al. Error Bounds for Convolutional Codes and an Asymptotically Optimum Decoding Algorithm , 1967 .
[44] Richard Veras,et al. A stencil compiler for short-vector SIMD architectures , 2013, ICS '13.
[45] Benoît Meister,et al. R-Stream Compiler , 2011, Encyclopedia of Parallel Computing.
[46] Franz Franchetti,et al. SIMD Vectorization of Non-Two-Power Sized FFTs , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.
[47] Markus Püschel,et al. Mechanical Derivation of Fused Multiply–Add Algorithms for Linear Transforms , 2007, IEEE Transactions on Signal Processing.
[48] Franz Franchetti,et al. Optimized parallel distribution load flow solver on commodity multi-core CPU , 2012, 2012 IEEE Conference on High Performance Extreme Computing.
[49] Matteo Frigo. A Fast Fourier Transform Compiler , 1999, PLDI.
[50] Markus Püschel,et al. Offline library adaptation using automatically generated heuristics , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).
[51] Tze Meng Low,et al. High-Assurance SPIRAL: End-to-End Guarantees for Robot and Car Control , 2017, IEEE Control Systems.
[52] Tobias Gysi,et al. STELLA: a domain-specific tool for structured grid methods in weather and climate models , 2015, SC15: International Conference for High Performance Computing, Networking, Storage and Analysis.
[53] Doru-Thom Popovici,et al. First look: Linear algebra-based triangle counting without matrix multiplication , 2017, 2017 IEEE High Performance Extreme Computing Conference (HPEC).
[54] Cleve B. Moler,et al. Numerical computing with MATLAB , 2004 .
[55] Franz Franchetti,et al. Formal datapath representation and manipulation for implementing DSP transforms , 2008, 2008 45th ACM/IEEE Design Automation Conference.
[56] Doru-Thom Popovici,et al. Mixed data layout kernels for vectorized complex arithmetic , 2017, 2017 IEEE High Performance Extreme Computing Conference (HPEC).
[57] Don S. Batory,et al. Achieving Extensibility Through Product-Lines and Domain-Specific Languages: A Case Study , 2000, ICSR.
[58] Thomas Holenstein,et al. Optimal Circuits for Streamed Linear Permutations Using RAM , 2016, FPGA.
[59] David A. Padua,et al. Programming for Locality and Parallelism with Hierarchically Tiled Arrays , 2003, LCPC.
[60] Ken Kennedy,et al. Automatic Type-Driven Library Generation for Telescoping Languages , 2003, ACM/IEEE SC 2003 Conference (SC'03).
[61] Frunz Frunchett,et al. SHORT VECTOR CODE GENERATION AND ADAPTATION FOR DSP ALGORITHMS , 2003 .
[62] Armando Solar-Lezama,et al. Programming by sketching for bit-streaming programs , 2005, PLDI '05.
[63] Jeremy Johnson,et al. A Haskell compiler for signal transforms , 2017, GPCE.
[64] Yevgen Voronenko,et al. Adaptive Mapping of Linear DSP Algorithms to Fixed-Point Arithmetic , 2004 .
[65] Steven G. Johnson,et al. The Design and Implementation of FFTW3 , 2005, Proceedings of the IEEE.
[66] Robert A. van de Geijn,et al. Designing Linear Algebra Algorithms by Transformation: Mechanizing the Expert Developer , 2012, VECPAR.
[67] Markus Püschel,et al. A Basic Linear Algebra Compiler , 2014, CGO '14.
[68] Tze Meng Low,et al. Optimizing FFT Resource Efficiency on FPGA using High-level Synthesis , 2017 .
[69] Eran Yahav,et al. Inferring Synchronization under Limited Observability , 2009, TACAS.
[70] K. J. Gough. Little language processing, an alternative to courses on compiler construction , 1981, SGCS.
[71] Katherine Yelick,et al. UPC Language Specifications V1.1.1 , 2003 .
[72] Uday Bondhugula,et al. PLuTo: A Practical and Fully Automatic Polyhedral Program Optimization System , 2015 .
[73] Franz Franchetti,et al. Linear Transforms : From Math to Efficient Hardware Extended , 2008 .
[74] José M. F. Moura,et al. Fast Automatic Generation of DSP Algorithms , 2001, International Conference on Computational Science.
[75] Robert W. Numrich,et al. Co-array Fortran for parallel programming , 1998, FORF.
[76] Frédo Durand,et al. Halide: a language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines , 2013, PLDI 2013.
[77] Krzysztof Czarnecki,et al. Generative programming - methods, tools and applications , 2000 .
[78] Bradford L. Chamberlain,et al. Parallel Programmability and the Chapel Language , 2007, Int. J. High Perform. Comput. Appl..
[79] Franz Franchetti,et al. High-performance synthetic aperture radar image formation on commodity multicore architectures , 2009, Defense + Commercial Sensing.
[80] Paolo Bientinesi,et al. Automatic Generation of Loop-Invariants for Matrix Operations , 2011, 2011 International Conference on Computational Science and Its Applications.
[81] Franz Franchetti,et al. Mathematical foundations of the GraphBLAS , 2016, 2016 IEEE High Performance Extreme Computing Conference (HPEC).
[82] Manuela M. Veloso,et al. Automating the modeling and optimization of the performance of signal transforms , 2002, IEEE Trans. Signal Process..
[83] James C. Hoe,et al. Permuting streaming data using RAMs , 2009, JACM.
[84] A.J. Viterbi. A personal history of the Viterbi algorithm , 2006, IEEE Signal Processing Magazine.
[85] David S. Wise,et al. Generic support of algorithmic and structural recursion for scientific computing , 2009, Int. J. Parallel Emergent Distributed Syst..
[86] Yuefan Deng,et al. New trends in high performance computing , 2001, Parallel Computing.
[87] Markus Püschel,et al. Bandit-based optimization on graphs with application to library performance tuning , 2009, ICML '09.
[88] Franz Franchetti,et al. Autotuning a Random Walk Boolean Satisfiability Solver , 2011, ICCS.
[89] W. Taha,et al. Plenary talk III Domain-specific languages , 2008, 2008 International Conference on Computer Engineering & Systems.
[90] Franz Franchetti,et al. How to Write Fast Numerical Code: A Small Introduction , 2007, GTTSE.
[91] daniel Scott. Smith. Mechanizing the development of software , 1991 .
[92] Doru-Thom Popovici,et al. Large Bandwidth-Efficient FFTs on Multicore and Multi-socket Systems , 2018, 2018 IEEE International Parallel and Distributed Processing Symposium (IPDPS).
[93] Tiark Rompf,et al. How to Architect a Query Compiler, Revisited , 2018, SIGMOD Conference.
[94] Franz Franchetti,et al. Real-time software implementation of an IEEE 802.11a baseband receiver on Intel multicore , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[95] Jack J. Dongarra,et al. A set of level 3 basic linear algebra subprograms , 1990, TOMS.
[96] Siegfried Benkner,et al. Compiling High Performance Fortran for distributed-memory architectures , 1999, Parallel Comput..
[97] I-Hsin Chung,et al. Active Harmony: Towards Automated Performance Tuning , 2002, ACM/IEEE SC 2002 Conference (SC'02).
[98] William J. Dally,et al. Sequoia: Programming the Memory Hierarchy , 2006, International Conference on Software Composition.
[99] Franz Franchetti,et al. Discrete Fourier Transform Compiler : From Mathematical Representation to Efficient Hardware , 2007 .
[100] Cédric Bastoul,et al. Code generation in the polyhedral model is easier than you think , 2004, Proceedings. 13th International Conference on Parallel Architecture and Compilation Techniques, 2004. PACT 2004..
[101] Richard Veras,et al. When polyhedral transformations meet SIMD code generation , 2013, PLDI.
[102] David A. Padua,et al. Programming for parallelism and locality with hierarchically tiled arrays , 2006, PPoPP '06.
[103] Jeremy R. Johnson,et al. Automatic derivation and implementation of fast convolution algorithms , 2004, J. Symb. Comput..
[104] Markus Püschel,et al. Computer Generation of General Size Linear Transform Libraries , 2009, 2009 International Symposium on Code Generation and Optimization.
[105] Franz Franchetti,et al. System Demonstration of Spiral: Generator for High-Performance Linear Transform Libraries , 2008, AMAST.
[106] Franz Franchetti,et al. Automatic Generation of the HPC Challenge's Global FFT Benchmark for BlueGene/P , 2012, VECPAR.
[107] James C. Hoe,et al. Fast and accurate resource estimation of automatically generated custom DFT IP cores , 2006, FPGA '06.
[108] Franz Franchetti,et al. Generating FPGA-Accelerated DFT Libraries , 2007 .
[109] Franz Franchetti,et al. Short vector code generation for the discrete Fourier transform , 2003, Proceedings International Parallel and Distributed Processing Symposium.
[110] Michael F. P. O'Boyle,et al. MILEPOST GCC: machine learning based research compiler , 2008 .
[111] Jan Maluszynski,et al. Logic, Programming and Prolog (2ed) , 1995 .
[112] Tze Meng Low,et al. Optimizing Space Time Adaptive Processing through accelerating memory-bounded operations , 2015, 2015 IEEE High Performance Extreme Computing Conference (HPEC).
[113] Hao Shen. Generation of a Fast JPEG 2000 Encoder using SPIRAL , 2008 .
[114] Paolo Bientinesi,et al. Program generation for small-scale linear algebra applications , 2018, CGO.
[115] José M. F. Moura,et al. Automatic implementation and platform adaptation of discrete filtering and wavelet algorithms , 2004 .
[116] Calvin Lin,et al. An annotation language for optimizing software libraries , 1999, DSL '99.
[117] Torsten Hoefler,et al. Polly-ACC Transparent compilation to heterogeneous hardware , 2016, ICS.
[118] R. W. Johnson,et al. A methodology for designing, modifying, and implementing Fourier transform algorithms on various architectures , 1990 .
[119] C. Loan. Computational Frameworks for the Fast Fourier Transform , 1992 .
[120] Jon Louis Bentley,et al. Programming pearls: little languages , 1986, CACM.
[121] Mary W. Hall,et al. CHiLL : A Framework for Composing High-Level Loop Transformations , 2007 .
[122] Franz Franchetti,et al. SPIRAL: Code Generation for DSP Transforms , 2005, Proceedings of the IEEE.
[123] Sergei Gorlatch,et al. High performance stencil code generation with Lift , 2018, CGO.
[124] Christian Lengauer,et al. Polly - Performing Polyhedral Optimizations on a Low-Level Intermediate Representation , 2012, Parallel Process. Lett..
[125] Chi-Bang Kuan,et al. Automated Empirical Optimization , 2011, Encyclopedia of Parallel Computing.
[126] Elizabeth R. Jessup,et al. Reliable Generation of High-Performance Matrix Algebra , 2012, ACM Trans. Math. Softw..
[127] Jack J. Dongarra,et al. An extended set of FORTRAN basic linear algebra subprograms , 1988, TOMS.
[128] Don H. Johnson,et al. Gauss and the history of the fast Fourier transform , 1985 .
[129] Elizabeth R. Jessup,et al. Build to order linear algebra kernels , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.
[130] Vivek Sarkar,et al. X10: an object-oriented approach to non-uniform cluster computing , 2005, OOPSLA '05.
[131] Arvind,et al. What is Bluespec? , 2009, SIGD.
[132] John Salvatier,et al. Theano: A Python framework for fast computation of mathematical expressions , 2016, ArXiv.
[133] Paolo Bientinesi,et al. A Domain-Specific Compiler for Linear Algebra Operations , 2012, VECPAR.
[134] Tiark Rompf,et al. Staging for generic programming in space and time , 2017, GPCE.
[135] David A. Padua,et al. SPL: a language and compiler for DSP algorithms , 2001, PLDI '01.
[136] Manuela M. Veloso,et al. Focused optimization for online detection of anomalous regions , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).
[137] Robert A. van de Geijn,et al. BLIS: A Framework for Rapidly Instantiating BLAS Functionality , 2015, ACM Trans. Math. Softw..
[138] M. Puschel,et al. FFT Program Generation for Shared Memory: SMP and Multicore , 2006, ACM/IEEE SC 2006 Conference (SC'06).
[139] André Platzer,et al. KeYmaera: A Hybrid Theorem Prover for Hybrid Systems (System Description) , 2008, IJCAR.
[140] Samuel Williams,et al. The Landscape of Parallel Computing Research: A View from Berkeley , 2006 .
[141] Franz Franchetti,et al. Generating SIMD Vectorized Permutations , 2008, CC.
[142] Franz Franchetti,et al. FFT Compiler: from math to efficient hardware HLDVT invited short paper , 2007, 2007 IEEE International High Level Design Validation and Test Workshop.
[143] Trevor Darrell,et al. Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.
[144] Paul H. J. Kelly,et al. Optimised three-dimensional Fourier interpolation: An analysis of techniques and application to a linear-scaling density functional theory code , 2015, Comput. Phys. Commun..
[145] Franz Franchetti,et al. Algebraic description and automatic generation of multigrid methods in SPIRAL , 2017, Concurr. Comput. Pract. Exp..
[146] Franz Franchetti,et al. Computer Generation of Efficient Software Viterbi Decoders , 2010, HiPEAC.
[147] Amir Shaikhha,et al. How to Architect a Query Compiler , 2016, SIGMOD Conference.
[148] Franz Franchetti,et al. Formal loop merging for signal transforms , 2005, PLDI '05.
[149] Stephen Wolfram,et al. The Mathematica book, 5th Edition , 2003 .
[150] Richard W. Vuduc,et al. Sparsity: Optimization Framework for Sparse Matrix Kernels , 2004, Int. J. High Perform. Comput. Appl..
[151] Martín Abadi,et al. TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.
[152] Martin Fowler,et al. Domain-Specific Languages , 2010, The Addison-Wesley signature series.
[153] Franz Franchetti,et al. Spiral-generated modular FFT algorithms , 2010, PASCO.
[154] Franz Franchetti,et al. A SIMD vectorizing compiler for digital signal processing algorithms , 2002, Proceedings 16th International Parallel and Distributed Processing Symposium.
[155] Franz Franchetti,et al. Computer Generation of Platform-Adapted Physical Layer Software , 2010 .
[156] Ken Kennedy,et al. The rise and fall of High Performance Fortran: an historical object lesson , 2007, HOPL.
[157] Charles L. Lawson,et al. Basic Linear Algebra Subprograms for Fortran Usage , 1979, TOMS.
[158] Franz Franchetti,et al. Hardware implementation of the discrete fourier transform with non-power-of-two problem size , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.
[159] James C. Hoe,et al. Automatic generation of customized discrete Fourier transform IPs , 2005, Proceedings. 42nd Design Automation Conference, 2005..
[160] J. Tukey,et al. An algorithm for the machine calculation of complex Fourier series , 1965 .
[161] John Shalf,et al. SEJITS: Getting Productivity and Performance With Selective Embedded JIT Specialization , 2010 .
[162] Franz Franchetti,et al. Automatic Performance Optimization of the Discrete Fourier Transform on Distributed Memory Computers , 2006, ISPA.
[163] Chua-Huang Huang,et al. Multilinear algebra and parallel programming , 1990, Proceedings SUPERCOMPUTING '90.
[164] Franz Franchetti,et al. Performance/Energy Optimization of DSP Transforms on the XScale Processor , 2007, HiPEAC.
[165] Michael J. C. Gordon,et al. From LCF to HOL: a short history , 2000, Proof, Language, and Interaction.
[166] Kunle Olukotun,et al. Delite , 2014, ACM Trans. Embed. Comput. Syst..
[167] Philip Heidelberger,et al. The Blue Gene/L Supercomputer: A Hardware and Software Story , 2007, International Journal of Parallel Programming.