Abstractions for Programming Graphics Processors in High-Level Programming Languages
暂无分享,去创建一个
[1] Prabhat,et al. Cataloging the Visible Universe Through Bayesian Inference at Petascale , 2018, 2018 IEEE International Parallel and Distributed Processing Symposium (IPDPS).
[2] Dennis Shasha,et al. Parakeet: a just-in-time parallel accelerator for python , 2012, HotPar'12.
[3] Robert L. Henderson,et al. Job Scheduling Under the Portable Batch System , 1995, JSSPP.
[4] Andy B. Yoo,et al. Approved for Public Release; Further Dissemination Unlimited X-ray Pulse Compression Using Strained Crystals X-ray Pulse Compression Using Strained Crystals , 2002 .
[5] Lei Xing,et al. GPU computing in medical physics: a review. , 2011, Medical physics.
[6] Partha Pratim Pande,et al. Hardware accelerators for biocomputing: A survey , 2010, Proceedings of 2010 IEEE International Symposium on Circuits and Systems.
[7] M. S. Krishnan,et al. An Empirical Analysis of Productivity and Quality in Software Products , 2000 .
[8] Walid Taha,et al. MetaML and multi-stage programming with explicit annotations , 2000, Theor. Comput. Sci..
[9] U. Naumann. Optimized Jacobian Accumulation Techniques , 2000 .
[10] Brian E. Granger,et al. IPython: A System for Interactive Scientific Computing , 2007, Computing in Science & Engineering.
[11] Vikram S. Adve,et al. LLVM: a compilation framework for lifelong program analysis & transformation , 2004, International Symposium on Code Generation and Optimization, 2004. CGO 2004..
[12] Dan Moldovan,et al. Tangent: Automatic Differentiation Using Source Code Transformation in Python , 2017, ArXiv.
[13] Martin Elsman,et al. Futhark: purely functional GPU-programming with nested parallelism and in-place array updates , 2017, PLDI.
[14] Andreas Herkersdorf,et al. Enabling FPGAs in Hyperscale Data Centers , 2015, 2015 IEEE 12th Intl Conf on Ubiquitous Intelligence and Computing and 2015 IEEE 12th Intl Conf on Autonomic and Trusted Computing and 2015 IEEE 15th Intl Conf on Scalable Computing and Communications and Its Associated Workshops (UIC-ATC-ScalCom).
[15] Wilfried Philips,et al. Quasar — A new heterogeneous programming framework for image and video processing algorithms on CPU and GPU , 2014, 2014 IEEE International Conference on Image Processing (ICIP).
[16] Robert Groth. Is the software industry's productivity declining? , 2004, IEEE Software.
[17] Shawki Areibi,et al. Deep Learning on FPGAs: Past, Present, and Future , 2016, ArXiv.
[18] Dhabaleswar K. Panda,et al. High Performance RDMA-Based MPI Implementation over InfiniBand , 2003, ICS '03.
[19] Michael W. Godfrey,et al. Evolution in open source software: a case study , 2000, Proceedings 2000 International Conference on Software Maintenance.
[20] Lei Wang,et al. Variational quantum eigensolver with fewer qubits , 2019, Physical Review Research.
[21] Raphael Landaverde,et al. An investigation of Unified Memory Access performance in CUDA , 2014, 2014 IEEE High Performance Extreme Computing Conference (HPEC).
[22] Takahiro Harada. A framework to transform in-core GPU algorithms to out-of-core algorithms , 2016, I3D.
[23] Kunle Olukotun,et al. Delite , 2014, ACM Trans. Embed. Comput. Syst..
[24] Qing Nie,et al. DifferentialEquations.jl – A Performant and Feature-Rich Ecosystem for Solving Differential Equations in Julia , 2017, Journal of Open Research Software.
[25] Lennart Ohlsson,et al. PyGPU: A high-level language for high-speed image processing , 2007 .
[26] Ryan P. Adams,et al. Gradient-based Hyperparameter Optimization through Reversible Learning , 2015, ICML.
[27] Siu Kwan Lam,et al. Numba: a LLVM-based Python JIT compiler , 2015, LLVM '15.
[28] L. Hogben. Handbook of Linear Algebra , 2006 .
[29] Robert J. Harrison,et al. Global Arrays: a portable "shared-memory" programming model for distributed memory computers , 1994, Proceedings of Supercomputing '94.
[30] Qin Zhang,et al. Improving software development management through software project telemetry , 2005, IEEE Software.
[31] Philipp Slusallek,et al. AnyDSL: a partial evaluation framework for programming high-performance libraries , 2018, Proc. ACM Program. Lang..
[32] Michael Innes,et al. Fashionable Modelling with Flux , 2018, ArXiv.
[33] R. Mises,et al. Praktische Verfahren der Gleichungsauflösung . , 1929 .
[34] Bjorn De Sutter,et al. Dynamic Automatic Differentiation of GPU Broadcast Kernels , 2018, NIPS 2018.
[35] Peter Lancaster,et al. Norms on direct sums and tensor products , 1972 .
[36] Michael Innes,et al. Don't Unroll Adjoint: Differentiating SSA-Form Programs , 2018, ArXiv.
[37] Frédo Durand,et al. Decoupling algorithms from schedules for easy optimization of image processing pipelines , 2012, ACM Trans. Graph..
[38] Michel Steuwer,et al. LIFT: A functional data-parallel IR for high-performance GPU code generation , 2017, 2017 IEEE/ACM International Symposium on Code Generation and Optimization (CGO).
[39] Barry W. Boehm,et al. Productivity trends in incremental and iterative software development , 2009, 2009 3rd International Symposium on Empirical Software Engineering and Measurement.
[40] Sujatha R. Upadhyaya,et al. Parallel approaches to machine learning - A comprehensive survey , 2013, J. Parallel Distributed Comput..
[41] Martin Odersky,et al. Lightweight modular staging: a pragmatic approach to runtime code generation and compiled DSLs , 2010, GPCE '10.
[42] Luca Antiga,et al. Automatic differentiation in PyTorch , 2017 .
[43] Christian Terboven,et al. OpenACC - First Experiences with Real-World Applications , 2012, Euro-Par.
[44] Krunal Patel,et al. ArrayFire: a GPU acceleration platform , 2012, Defense, Security, and Sensing.
[45] Barak A. Pearlmutter,et al. Reverse-mode AD in a functional framework: Lambda the ultimate backpropagator , 2008, TOPL.
[46] Dimitrios Soudris,et al. A survey on reconfigurable accelerators for cloud computing , 2016, 2016 26th International Conference on Field Programmable Logic and Applications (FPL).
[47] Uwe Naumann,et al. Optimal Jacobian accumulation is NP-complete , 2007, Math. Program..
[48] Ken Kennedy,et al. Maximizing Loop Parallelism and Improving Data Locality via Loop Fusion and Distribution , 1993, LCPC.
[49] Gaël Varoquaux,et al. The NumPy Array: A Structure for Efficient Numerical Computation , 2011, Computing in Science & Engineering.
[50] Anton Alexandrovich Malakhov. Composable Multi-Threading for Python Libraries , 2016 .
[51] Simon L. Peyton Jones,et al. Efficient differentiable programming in a functional array-processing language , 2018, Proc. ACM Program. Lang..
[52] Daniel Zingaro,et al. Modern Extensible Languages , 2007 .
[53] Frédo Durand,et al. Differentiable programming for image processing and deep learning in halide , 2018, ACM Trans. Graph..
[54] Jacques Pienaar,et al. MLIR Primer: A Compiler Infrastructure for the End of Moore’s Law , 2019 .
[55] Karl Rupp,et al. ViennaCL-A High Level Linear Algebra Library for GPUs and Multi-Core CPUs , 2010 .
[56] Jingyue Wu,et al. gpucc: An open-source GPGPU compiler , 2016, 2016 IEEE/ACM International Symposium on Code Generation and Optimization (CGO).
[57] Fei Wang,et al. Demystifying differentiable programming: shift/reset the penultimate backpropagator , 2018, Proc. ACM Program. Lang..
[58] Miles Lubin,et al. Forward-Mode Automatic Differentiation in Julia , 2016, ArXiv.
[59] Ian J. Goodfellow,et al. DLVM: A MODERN COMPILER FRAMEWORK FOR NEURAL NETWORK DSLS , 2017 .
[60] Martin Odersky,et al. Spiral in scala: towards the systematic construction of generators for performance libraries , 2014, GPCE '13.
[61] Jan Vitek,et al. Julia subtyping: a rational reconstruction , 2018, Proc. ACM Program. Lang..