NDS: N-Dimensional Storage
暂无分享,去创建一个
[1] Haichen Shen,et al. TVM: An Automated End-to-End Optimizing Compiler for Deep Learning , 2018, OSDI.
[2] John Shalf,et al. DRAM-Less: Hardware Acceleration of Data Processing with New Memory , 2020, 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[3] Yang Liu,et al. Hippogriff: Efficiently moving data in heterogeneous computing systems , 2016, 2016 IEEE 34th International Conference on Computer Design (ICCD).
[4] Anima Anandkumar,et al. Tensor Contractions with Extended BLAS Kernels on CPU and GPU , 2016, 2016 IEEE 23rd International Conference on High Performance Computing (HiPC).
[5] Daniel J. Abadi,et al. Column-stores vs. row-stores: how different are they really? , 2008, SIGMOD Conference.
[6] Jinyoung Lee,et al. Biscuit: A Framework for Near-Data Processing of Big Data Workloads , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).
[7] Bojan Mrazovac,et al. Performance evaluation of using Protocol Buffers in the Internet of Things communication , 2016, 2016 International Conference on Smart Systems and Technologies (SST).
[8] John F. Stanton,et al. A massively parallel tensor contraction framework for coupled-cluster computations , 2014, J. Parallel Distributed Comput..
[9] John Shalf,et al. NVMMU: A Non-volatile Memory Management Unit for Heterogeneous GPU-SSD Architectures , 2015, 2015 International Conference on Parallel Architecture and Compilation (PACT).
[10] Rachata Ausavarungnirun,et al. The Virtual Block Interface: A Flexible Alternative to the Conventional Virtual Memory Framework , 2020, 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA).
[11] Chanik Park,et al. Enabling cost-effective data processing with smart SSD , 2013, 2013 IEEE 29th Symposium on Mass Storage Systems and Technologies (MSST).
[12] Song Han,et al. SpArch: Efficient Architecture for Sparse Matrix Multiplication , 2020, 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[13] Aamer Jaleel,et al. ExTensor: An Accelerator for Sparse Tensor Algebra , 2019, MICRO.
[14] John Thompson,et al. Think Fast: A Tensor Streaming Processor (TSP) for Accelerating Deep Learning Workloads , 2020, 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA).
[15] Jimeng Sun,et al. An input-adaptive and in-place approach to dense tensor-times-matrix multiply , 2015, SC15: International Conference for High Performance Computing, Networking, Storage and Analysis.
[16] Vivek Sarkar,et al. Automatic data layout generation and kernel mapping for CPU+GPU architectures , 2016, CC.
[17] H. Howie Huang,et al. G-Store: High-Performance Graph Store for Trillion-Edge Processing , 2016, SC16: International Conference for High Performance Computing, Networking, Storage and Analysis.
[18] Marcin Zukowski,et al. DSM vs. NSM: CPU performance tradeoffs in block-oriented query processing , 2008, DaMoN '08.
[19] John R. Gilbert,et al. On the representation and multiplication of hypersparse matrices , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.
[20] Javier González,et al. LightNVM: The Linux Open-Channel SSD Subsystem , 2017, FAST.
[21] Dipankar Das,et al. SIGMA: A Sparse and Irregular GEMM Accelerator with Flexible Interconnects for DNN Training , 2020, 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[22] S. Reinhardt,et al. AWB-GCN: A Graph Convolutional Network Accelerator with Runtime Workload Rebalancing , 2019, 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[23] Tianshi Chen,et al. Cambricon-S: Addressing Irregularity in Sparse Neural Networks through A Cooperative Software/Hardware Approach , 2018, 2018 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[24] Steven Swanson,et al. Morpheus: Creating Application Objects Efficiently for Heterogeneous Computing , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).
[25] Paolo Bientinesi,et al. Design of a High-Performance GEMM-like Tensor–Tensor Multiplication , 2016, ACM Trans. Math. Softw..
[26] Andrea C. Arpaci-Dusseau,et al. Towards an Unwritten Contract of Intel Optane SSD , 2019, HotStorage.
[27] Yuan Xie,et al. Sparse Tensor Core: Algorithm and Hardware Co-Design for Vector-wise Sparse Neural Networks on Modern GPUs , 2019, MICRO.
[28] Yu-Ching Hu,et al. Dynamic Multi-Resolution Data Storage , 2019, MICRO.
[29] Jun Yang,et al. A durable and energy efficient main memory using phase change memory technology , 2009, ISCA '09.
[30] James R. Larus,et al. Persona: A High-Performance Bioinformatics Framework , 2017, USENIX Annual Technical Conference.
[31] W. Dally,et al. SCNN , 2017 .
[32] Guy E. Blelloch,et al. GraphChi: Large-Scale Graph Computation on Just a PC , 2012, OSDI.
[33] Devin A. Matthews,et al. High-Performance Tensor Contraction without Transposition , 2016, SIAM J. Sci. Comput..
[34] Onur Mutlu,et al. Gather-Scatter DRAM: In-DRAM address translation to improve the spatial locality of non-unit strided accesses , 2015, 2015 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[35] Peter Ahrens,et al. Tensor Algebra Compilation with Workspaces , 2019, 2019 IEEE/ACM International Symposium on Code Generation and Optimization (CGO).
[36] Naehyuck Chang,et al. PTL: PCM Translation Layer , 2012, 2012 IEEE Computer Society Annual Symposium on VLSI.
[37] Myoungsoo Jung,et al. TensorPRAM: Designing a Scalable Heterogeneous Deep Learning Accelerator with Byte-addressable PRAMs , 2020, HotStorage.
[38] Erik Brunvand,et al. Impulse: building a smarter memory controller , 1999, Proceedings Fifth International Symposium on High-Performance Computer Architecture.
[39] Nitish Srivastava,et al. Tensaurus: A Versatile Accelerator for Mixed Sparse-Dense Tensor Computations , 2020, 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[40] Gustavo Alonso,et al. Ibex - An Intelligent Storage Engine with Support for Advanced SQL Off-loading , 2014, Proc. VLDB Endow..
[41] William J. Dally,et al. SCNN: An accelerator for compressed-sparse convolutional neural networks , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).
[42] Saman P. Amarasinghe,et al. Format abstraction for sparse tensor algebra compilers , 2018, Proc. ACM Program. Lang..
[43] David A. Patterson,et al. In-datacenter performance analysis of a tensor processing unit , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).
[44] David J. DeWitt,et al. Query processing on smart SSDs: opportunities and challenges , 2013, SIGMOD '13.
[45] Robert A. van de Geijn,et al. An API for Manipulating Matrices Stored by Blocks ∗ Tze Meng Low , 2004 .
[46] T. N. Vijaykumar,et al. SparTen: A Sparse Tensor Accelerator for Convolutional Neural Networks , 2019, MICRO.
[47] Joel H. Saltz,et al. Active disks: programming model, algorithms and evaluation , 1998, ASPLOS VIII.
[48] David E. Bernholdt,et al. Synthesis of High-Performance Parallel Programs for a Class of ab Initio Quantum Chemistry Models , 2005, Proceedings of the IEEE.
[49] Yi Yang,et al. Optimizing Memory Efficiency for Deep Convolutional Neural Networks on GPUs , 2016, SC16: International Conference for High Performance Computing, Networking, Storage and Analysis.
[50] Tamara G. Kolda,et al. Efficient MATLAB Computations with Sparse and Factored Tensors , 2007, SIAM J. Sci. Comput..
[51] Benoît Pradelle,et al. Memory-efficient parallel tensor decompositions , 2017, 2017 IEEE High Performance Extreme Computing Conference (HPEC).
[52] Benoît Meister,et al. Optimization of symmetric tensor computations , 2015, 2015 IEEE High Performance Extreme Computing Conference (HPEC).
[53] Tamara G. Kolda,et al. Parallel Tensor Compression for Large-Scale Scientific Data , 2015, 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS).
[54] Nitish Srivastava,et al. MatRaptor: A Sparse-Sparse Matrix Multiplication Accelerator Based on Row-Wise Product , 2020, 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[55] Paul H. Siegel,et al. Characterizing flash memory: Anomalies, observations, and applications , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[56] Kiran Kumar Matam,et al. GraphSSD: Graph Semantics Aware SSD , 2019, 2019 ACM/IEEE 46th Annual International Symposium on Computer Architecture (ISCA).
[57] Yang Liu,et al. Willow: A User-Programmable SSD , 2014, OSDI.
[58] Myoungsoo Jung,et al. Flashabacus: a self-governing flash-based accelerator for low-power systems , 2018, EuroSys.
[59] Henry M. Levy,et al. Virtual Memory Management in the VAX/VMS Operating System , 1982, Computer.
[60] Shoaib Kamil,et al. The tensor algebra compiler , 2017, Proc. ACM Program. Lang..
[61] Yousef Saad,et al. Iterative methods for sparse linear systems , 2003 .
[62] John D. Owens,et al. GraphBLAST: A High-Performance Linear Algebra-based Graph Framework on the GPU , 2019, ACM Trans. Math. Softw..
[63] Animesh Trivedi,et al. Albis: High-Performance File Format for Big Data Systems , 2018, USENIX Annual Technical Conference.
[64] Frank Nielsen,et al. K-nearest neighbor search: Fast GPU-based implementations and application to high-dimensional feature matching , 2010, 2010 IEEE International Conference on Image Processing.
[65] Sungjin Lee,et al. BlueDBM: An appliance for Big Data analytics , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).
[66] Kevin Skadron,et al. Rodinia: A benchmark suite for heterogeneous computing , 2009, 2009 IEEE International Symposium on Workload Characterization (IISWC).
[67] David A. Bader,et al. Graph Partitioning and Graph Clustering, 10th DIMACS Implementation Challenge Workshop, Georgia Institute of Technology, Atlanta, GA, USA, February 13-14, 2012. Proceedings , 2013, Graph Partitioning and Graph Clustering.
[68] Steven Swanson,et al. Summarizer: Trading Communication with Computing Near Storage , 2017, 2017 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[69] Bora Uçar,et al. High Performance Parallel Algorithms for the Tucker Decomposition of Sparse Tensors , 2016, 2016 45th International Conference on Parallel Processing (ICPP).
[70] Robert A. van de Geijn,et al. The libflame Library for Dense Matrix Computations , 2009, Computing in Science & Engineering.
[71] John R. Gilbert,et al. Parallel sparse matrix-vector and matrix-transpose-vector multiplication using compressed sparse blocks , 2009, SPAA '09.
[72] Jason Cong,et al. RC-NVM: Dual-Addressing Non-Volatile Memory Architecture Supporting Both Row and Column Memory Accesses , 2019, IEEE Transactions on Computers.
[73] Fan Yang,et al. LFTF: A Framework for Efficient Tensor Analytics at Scale , 2017, Proc. VLDB Endow..
[74] Ganesh G Surve,et al. Parallel implementation of Bellman-ford algorithm using CUDA architecture , 2017, 2017 International conference of Electronics, Communication and Aerospace Technology (ICECA).
[75] Kevin Skadron,et al. HotSpot: a compact thermal modeling methodology for early-stage VLSI design , 2006, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.
[76] Eun-Jin Im,et al. Model-Based Memory Hierarchy Optimizations for Sparse Matrices , 2007 .
[77] Victor Podlozhnyuk,et al. Image Convolution with CUDA , 2007 .