暂无分享,去创建一个
Nectarios Koziris | Onur Mutlu | Christina Giannoula | Juan G'omez-Luna | Georgios Goumas | Ivan Fernandez
[1] James Demmel,et al. Performance Optimizations and Bounds for Sparse Matrix-Vector Multiply , 2002, ACM/IEEE SC 2002 Conference (SC'02).
[2] Torsten Hoefler,et al. SlimSell: A Vectorizable Graph Representation for Breadth-First Search , 2017, 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS).
[3] Tianshi Chen,et al. Cambricon-S: Addressing Irregularity in Sparse Neural Networks through A Cooperative Software/Hardware Approach , 2018, 2018 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[4] Richard W. Vuduc,et al. Model-driven autotuning of sparse matrix-vector multiply on GPUs , 2010, PPoPP '10.
[5] Mattan Erez,et al. Near Data Acceleration with Concurrent Host Access , 2020, 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA).
[6] Nectarios Koziris,et al. Perfomance Models for Blocked Sparse Matrix-Vector Multiplication Kernels , 2009, 2009 International Conference on Parallel Processing.
[7] John R. Gilbert,et al. Parallel sparse matrix-vector and matrix-transpose-vector multiplication using compressed sparse blocks , 2009, SPAA '09.
[8] Rachata Ausavarungnirun,et al. Enabling Practical Processing in and near Memory for Data-Intensive Computing , 2019, DAC.
[9] Onur Mutlu,et al. LazyPIM: An Efficient Cache Coherence Mechanism for Processing-in-Memory , 2017, IEEE Computer Architecture Letters.
[10] Peter M. Kogge,et al. Scalability of Hybrid Sparse Matrix Dense Vector (SpMV) Multiplication , 2018, 2018 International Conference on High Performance Computing & Simulation (HPCS).
[11] Onur Mutlu,et al. DAMOV: A New Methodology and Benchmark Suite for Evaluating Data Movement Bottlenecks , 2021, IEEE Access.
[12] Richard W. Vuduc,et al. Sparsity: Optimization Framework for Sparse Matrix Kernels , 2004, Int. J. High Perform. Comput. Appl..
[13] Srinivasan Parthasarathy,et al. Efficient sparse-matrix multi-vector product on GPUs , 2018, HPDC.
[14] Nectarios Koziris,et al. Understanding the Performance of Sparse Matrix-Vector Multiplication , 2008, 16th Euromicro Conference on Parallel, Distributed and Network-Based Processing (PDP 2008).
[15] Guangming Tan,et al. TileSpMV: A Tiled Algorithm for Sparse Matrix-Vector Multiplication on GPUs , 2021, 2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS).
[16] Carole-Jean Wu,et al. The Architectural Implications of Facebook's DNN-Based Personalized Recommendation , 2019, 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[17] Rob H. Bisseling,et al. Communication balancing in parallel sparse matrix-vector multiplication , 2005 .
[18] Onur Mutlu,et al. Benchmarking Memory-Centric Computing Systems: Analysis of Real Processing-In-Memory Hardware , 2021, 2021 12th International Green and Sustainable Computing Conference (IGSC).
[19] N. Koziris,et al. Conflict-free symmetric sparse matrix-vector multiplication on multicore architectures , 2019, SC.
[20] Shengen Yan,et al. yaSpMV: yet another SpMV framework on GPUs , 2014, PPoPP.
[21] Brian Vinter,et al. An Efficient GPU General Sparse Matrix-Matrix Multiplication for Irregular Data , 2014, 2014 IEEE 28th International Parallel and Distributed Processing Symposium.
[22] Bahar Asgari,et al. ALRESCHA: A Lightweight Reconfigurable Sparse-Computation Accelerator , 2020, 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[23] Nectarios Koziris,et al. Optimizing sparse matrix-vector multiplication using index and value compression , 2008, CF '08.
[24] Dimin Niu,et al. iPIM: Programmable In-Memory Image Processing Accelerator Using Near-Bank Architecture , 2020, 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA).
[25] Rahul Boyapati,et al. Active-Routing: Compute on the Way for Near-Data Processing , 2019, 2019 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[26] Yun Liang,et al. Optimizing and auto-tuning scale-free sparse matrix-vector multiplication on Intel Xeon Phi , 2015, 2015 IEEE/ACM International Symposium on Code Generation and Optimization (CGO).
[27] Rachata Ausavarungnirun,et al. Processing Data Where It Makes Sense: Enabling In-Memory Computation , 2019, Microprocess. Microsystems.
[28] Eric S. Chung,et al. A High Memory Bandwidth FPGA Accelerator for Sparse Matrix-Vector Multiplication , 2014, 2014 IEEE 22nd Annual International Symposium on Field-Programmable Custom Computing Machines.
[29] D FalgoutRobert. An Introduction to Algebraic Multigrid , 2006 .
[30] Onur Mutlu,et al. Improving DRAM performance by parallelizing refreshes with accesses , 2014, 2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA).
[31] Udo W. Pooch,et al. A Survey of Indexing Techniques for Sparse Matrices , 1973, CSUR.
[32] Minsoo Rhu,et al. TensorDIMM: A Practical Near-Memory Processing Architecture for Embeddings and Tensor Operations in Deep Learning , 2019, MICRO.
[33] Shin-Dug Kim,et al. Functionality-Based Processing-in-Memory Accelerator for Deep Convolutional Neural Networks , 2021, IEEE Access.
[34] Brian Vinter,et al. CSR5: An Efficient Storage Format for Cross-Platform Sparse Matrix-Vector Multiplication , 2015, ICS.
[35] Martin D. Schatz,et al. RecNMP: Accelerating Personalized Recommendation with Near-Memory Processing , 2019, 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA).
[36] Rachata Ausavarungnirun,et al. CoNDA: Efficient Cache Coherence Support for Near-Data Accelerators , 2019, 2019 ACM/IEEE 46th Annual International Symposium on Computer Architecture (ISCA).
[37] Oscar Plata,et al. NATSA: A Near-Data Processing Accelerator for Time Series Analysis , 2020, 2020 IEEE 38th International Conference on Computer Design (ICCD).
[38] Rachata Ausavarungnirun,et al. RowClone: Fast and energy-efficient in-DRAM bulk data copy and initialization , 2013, 2013 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[39] Aamer Jaleel,et al. ExTensor: An Accelerator for Sparse Tensor Algebra , 2019, MICRO.
[40] Bora Uçar,et al. Semi-two-dimensional Partitioning for Parallel Sparse Matrix-Vector Multiplication , 2015, 2015 IEEE International Parallel and Distributed Processing Symposium Workshop.
[41] Christoforos E. Kozyrakis,et al. Practical Near-Data Processing for In-Memory Analytics Frameworks , 2015, 2015 International Conference on Parallel Architecture and Compilation (PACT).
[42] Nectarios Koziris,et al. CSX: an extended compression format for spmv on shared memory systems , 2011, PPoPP '11.
[43] Weixing Ji,et al. Sparse matrix partitioning for optimizing SpMV on CPU-GPU heterogeneous platforms , 2020, Int. J. High Perform. Comput. Appl..
[44] Onur Mutlu,et al. SIMDRAM: a framework for bit-serial SIMD processing using DRAM , 2020, ASPLOS.
[45] David Blaauw,et al. OuterSPACE: An Outer Product Based Sparse Matrix Multiplication Accelerator , 2018, 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[46] Pen-Chung Yew,et al. Variable-Sized Blocks for Locality-Aware SpMV , 2021, 2021 IEEE/ACM International Symposium on Code Generation and Optimization (CGO).
[47] Maurice Herlihy,et al. Concurrent Data Structures with Near-Data-Processing: an Architecture-Aware Implementation , 2019, SPAA.
[48] Tanya Y. Berger-Wolf,et al. AdELL: An Adaptive Warp-Balancing ELL Format for Efficient Sparse Matrix-Vector Multiplication on GPUs , 2013, 2013 42nd International Conference on Parallel Processing.
[49] Hans-Peter Seidel,et al. Globally homogeneous, locally adaptive sparse matrix-vector multiplication on the GPU , 2017, ICS.
[50] Andrew Lumsdaine,et al. Accelerating sparse matrix computations via data compression , 2006, ICS '06.
[51] Jiajia Li,et al. Design and Implementation of Adaptive SpMV Library for Multicore and Many-Core Architecture , 2018, ACM Trans. Math. Softw..
[52] Eriko Nurvitadhi,et al. A sparse matrix vector multiply accelerator for support vector machine , 2015, 2015 International Conference on Compilers, Architecture and Synthesis for Embedded Systems (CASES).
[53] Fabio Checconi,et al. ALTO: adaptive linearized storage of sparse tensors , 2021, ICS.
[54] Nectarios Koziris,et al. SparseX: A Library for High-Performance Sparse Matrix-Vector Multiplication on Multicore Platforms , 2018, ACM Trans. Math. Softw..
[55] Onur Mutlu,et al. Processing-in-memory: A workload-driven perspective , 2019, IBM J. Res. Dev..
[56] Shoaib Kamil,et al. Taco: A tool to generate tensor algebra kernels , 2017, 2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE).
[57] Rajeev Balasubramonian,et al. OrderLight: Lightweight Memory-Ordering Primitive for Efficient Fine-Grained PIM Computations , 2021, MICRO.
[58] Yu Wang,et al. GraphH: A Processing-in-Memory Architecture for Large-Scale Graph Processing , 2019, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.
[59] Ninghui Sun,et al. SMAT: an input adaptive auto-tuner for sparse matrix-vector multiplication , 2013, PLDI.
[60] Samuel Williams,et al. Optimization of sparse matrix-vector multiplication on emerging multicore platforms , 2007, Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07).
[61] Kenli Li,et al. Performance Optimization Using Partitioned SpMV on GPUs and Multicore CPUs , 2015, IEEE Transactions on Computers.
[62] Yanzhi Wang,et al. GraphQ: Scalable PIM-Based Graph Processing , 2019, MICRO.
[63] Fabrice Devaux,et al. The true Processing In Memory accelerator , 2019, 2019 IEEE Hot Chips 31 Symposium (HCS).
[64] Feng Yan,et al. Efficient PageRank and SpMV Computation on AMD GPUs , 2010, 2010 39th International Conference on Parallel Processing.
[65] Donghyuk Lee,et al. Near-memory data transformation for efficient sparse matrix multi-vector multiplication , 2019, SC.
[66] Eriko Nurvitadhi,et al. Fine-grained accelerators for sparse machine learning workloads , 2017, 2017 22nd Asia and South Pacific Design Automation Conference (ASP-DAC).
[67] Eitan Grinspun,et al. Sparse matrix solvers on the GPU: conjugate gradients and multigrid , 2003, SIGGRAPH Courses.
[68] Jack Dongarra,et al. Sparse Matrix Libraries in C++ for High Performance Architectures , 1997 .
[69] Sudhakar Yalamanchili,et al. Demystifying the characteristics of 3D-stacked memories: A case study for Hybrid Memory Cube , 2017, 2017 IEEE International Symposium on Workload Characterization (IISWC).
[70] Nectarios Koziris,et al. SynCron: Efficient Synchronization Support for Near-Data-Processing Architectures , 2021, 2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA).
[71] Kenli Li,et al. A hybrid computing method of SpMV on CPU-GPU heterogeneous computing systems , 2017, J. Parallel Distributed Comput..
[72] Franz Franchetti,et al. Accelerating sparse matrix-matrix multiplication with 3D-stacked logic-in-memory hardware , 2013, 2013 IEEE High Performance Extreme Computing Conference (HPEC).
[73] Xin Liu,et al. Towards Efficient SpMV on Sunway Manycore Architectures , 2018, ICS.
[74] Maurice Herlihy,et al. Concurrent Data Structures for Near-Memory Computing , 2017, SPAA.
[75] Christoforos E. Kozyrakis,et al. TETRIS: Scalable and Efficient Neural Network Acceleration with 3D Memory , 2017, ASPLOS.
[76] Onur Mutlu,et al. A case for exploiting subarray-level parallelism (SALP) in DRAM , 2012, 2012 39th Annual International Symposium on Computer Architecture (ISCA).
[77] Nathan Beckmann,et al. Livia: Data-Centric Computing Throughout the Memory Hierarchy , 2020, ASPLOS.
[78] Lei Deng,et al. SpaceA: Sparse Matrix Vector Multiplication on Processing-in-Memory Accelerator , 2021, 2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA).
[79] Hyun Jin Moon,et al. Fast Sparse Matrix-Vector Multiplication by Exploiting Variable Block Structure , 2005, HPCC.
[80] Timothy A. Davis,et al. The university of Florida sparse matrix collection , 2011, TOMS.
[81] Dipankar Das,et al. SIGMA: A Sparse and Irregular GEMM Accelerator with Flexible Interconnects for DNN Training , 2020, 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[82] Nectarios Koziris,et al. Performance Analysis and Optimization of Sparse Matrix-Vector Multiplication on Modern Multi- and Many-Core Processors , 2017, 2017 46th International Conference on Parallel Processing (ICPP).
[83] Ngai Wong,et al. Design space exploration for sparse matrix-matrix multiplication on FPGAs , 2010, 2010 International Conference on Field-Programmable Technology.
[84] Kenli Li,et al. Performance Analysis and Optimization for SpMV on GPU Using Probabilistic Modeling , 2015, IEEE Transactions on Parallel and Distributed Systems.
[85] Yao Zhang,et al. Scan primitives for GPU computing , 2007, GH '07.
[86] Pascal Hénon,et al. PaStiX: a high-performance parallel direct solver for sparse symmetric positive definite systems , 2002, Parallel Comput..
[87] Kishore Kothapalli,et al. Architecture- and workload- aware heterogeneous algorithms for sparse matrix vector multiplication , 2014, COMPUTE '14.
[88] Michele Martone,et al. Efficient multithreaded untransposed, transposed or symmetric sparse matrix-vector multiplication with the Recursive Sparse Blocks format , 2014, Parallel Comput..
[89] Greg Linden,et al. Amazon . com Recommendations Item-to-Item Collaborative Filtering , 2001 .
[90] Kashif Nizam Khan,et al. RAPL in Action: Experiences in Using RAPL for Power Measurements , 2020 .
[91] Parallel Hash Table Design for NDP Systems , 2020, MEMSYS.
[92] Robert D. Falgout,et al. hypre: A Library of High Performance Preconditioners , 2002, International Conference on Computational Science.
[93] Eriko Nurvitadhi,et al. Hardware accelerator for analytics of sparse data , 2016, 2016 Design, Automation & Test in Europe Conference & Exhibition (DATE).
[94] Michael Garland,et al. Merge-Based Parallel Sparse Matrix-Vector Multiplication , 2016, SC16: International Conference for High Performance Computing, Networking, Storage and Analysis.
[95] Shaoli Liu,et al. Cambricon-X: An accelerator for sparse neural networks , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[96] Kurt Keutzer,et al. clSpMV: A Cross-Platform OpenCL SpMV Framework on GPUs , 2012, ICS '12.
[97] Magnus Jahre,et al. An energy efficient column-major backend for FPGA SpMV accelerators , 2014, 2014 IEEE 32nd International Conference on Computer Design (ICCD).
[98] L. Dagum,et al. OpenMP: an industry standard API for shared-memory programming , 1998 .
[99] Onur Mutlu,et al. Chapter Four - Simple Operations in Memory to Reduce Data Movement , 2017, Adv. Comput..
[100] Rachata Ausavarungnirun,et al. A Modern Primer on Processing in Memory , 2020, ArXiv.
[101] Rudolf Eigenmann,et al. Adaptive runtime tuning of parallel sparse matrix-vector multiplication on distributed memory systems , 2008, ICS '08.
[102] Kenli Li,et al. Optimization of quasi-diagonal matrix–vector multiplication on GPU , 2014, Int. J. High Perform. Comput. Appl..
[103] Rok Sosic,et al. SNAP , 2016, ACM Trans. Intell. Syst. Technol..
[104] Ping Guo,et al. A Performance Modeling and Optimization Analysis Tool for Sparse Matrix-Vector Multiplication on GPUs , 2014, IEEE Transactions on Parallel and Distributed Systems.
[105] Onur Mutlu,et al. Benchmarking a New Paradigm: An Experimental Analysis of a Real Processing-in-Memory Architecture , 2021, ArXiv.
[106] Calvin J. Ribbens,et al. Pattern-based sparse matrix representation for memory-efficient SMVM kernels , 2009, ICS.
[107] Xing Liu,et al. Efficient sparse matrix-vector multiplication on x86-based many-core processors , 2013, ICS '13.
[108] Y. Saad,et al. Krylov Subspace Methods on Supercomputers , 1989 .
[109] Srinivasan Parthasarathy,et al. Automatic Selection of Sparse Matrix Representation on GPUs , 2015, ICS.
[110] Samuel Williams,et al. Reduced-Bandwidth Multithreaded Algorithms for Sparse Matrix-Vector Multiplication , 2011, 2011 IEEE International Parallel & Distributed Processing Symposium.
[111] Wayne Luk,et al. Accelerating SpMV on FPGAs by Compressing Nonzero Values , 2015, 2015 IEEE 23rd Annual International Symposium on Field-Programmable Custom Computing Machines.
[112] O Seongil,et al. Hardware Architecture and Software Stack for PIM Based on Commercial DRAM Technology : Industrial Product , 2021, 2021 ACM/IEEE 48th Annual International Symposium on Computer Architecture (ISCA).
[113] Tze Meng Low,et al. Efficient SpMV Operation for Large and Highly Sparse Matrices using Scalable Multi-way Merge Parallelization , 2019, MICRO.
[114] Richard Vuduc,et al. Automatic performance tuning of sparse matrix kernels , 2003 .
[115] Onur Mutlu,et al. Accelerating pointer chasing in 3D-stacked memory: Challenges, mechanisms, evaluation , 2016, 2016 IEEE 34th International Conference on Computer Design (ICCD).
[116] P. Sadayappan,et al. On improving the performance of sparse matrix-vector multiplication , 1997, Proceedings Fourth International Conference on High-Performance Computing.
[117] Kiyoung Choi,et al. A scalable processing-in-memory accelerator for parallel graph processing , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).
[118] Bahar Asgari,et al. Copernicus: Characterizing the Performance Implications of Compression Formats Used in Sparse Workloads , 2021, 2021 IEEE International Symposium on Workload Characterization (IISWC).
[119] Babak Falsafi,et al. The mondrian data engine , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).
[120] Feng Shi,et al. Sparse Matrix Format Selection with Multiclass SVM for SpMV on GPU , 2016, 2016 45th International Conference on Parallel Processing (ICPP).
[121] Marcin Paprzycki,et al. On BLAS Operations with Recursively Stored Sparse Matrices , 2010, 2010 12th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing.
[122] Katherine A. Yelick,et al. Optimizing Sparse Matrix Vector Multiplication on SMP , 1999, SIAM Conference on Parallel Processing for Scientific Computing.
[123] Mattan Erez,et al. Accelerating Bandwidth-Bound Deep Learning Inference with Main-Memory Accelerators , 2020, SC21: International Conference for High Performance Computing, Networking, Storage and Analysis.
[124] Yue Zhao,et al. Bridging the gap between deep learning and sparse matrix format selection , 2018, PPoPP.
[125] Dominique Lavenier,et al. Variant Calling Parallelization on Processor-in-Memory Architecture , 2020, bioRxiv.
[126] Pavel Tvrdík,et al. Evaluation Criteria for Sparse Matrix Storage Formats , 2016, IEEE Transactions on Parallel and Distributed Systems.
[127] Sander Stuijk,et al. Near-Memory Computing: Past, Present, and Future , 2019, Microprocess. Microsystems.
[128] Jung Ho Ahn,et al. Chameleon: Versatile and practical near-DRAM acceleration architecture for large memory systems , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[129] Nectarios Koziris,et al. Performance evaluation of the sparse matrix-vector multiplication on modern architectures , 2009, The Journal of Supercomputing.
[130] Onur Mutlu,et al. Ambit: In-Memory Accelerator for Bulk Bitwise Operations Using Commodity DRAM Technology , 2017, 2017 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[131] Yinghai Lu,et al. Deep Learning Recommendation Model for Personalization and Recommendation Systems , 2019, ArXiv.
[132] Beata Bylina,et al. Performance analysis of multicore and multinodal implementation of SpMV operation , 2014, 2014 Federated Conference on Computer Science and Information Systems.
[133] J. Navarro-Pedreño. Numerical Methods for Least Squares Problems , 1996 .
[134] Arutyun Avetisyan,et al. Automatically Tuning Sparse Matrix-Vector Multiplication for GPU Architectures , 2010, HiPEAC.
[135] Jung Ho Ahn,et al. TRiM: Enhancing Processor-Memory Interfaces with Scalable Tensor Reduction in Memory , 2021, MICRO.
[136] Onur Mutlu,et al. SMASH: Co-designing Software Compression and Hardware-Accelerated Indexing for Efficient Sparse Matrix Operations , 2019, MICRO.
[137] Sergey Brin,et al. The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.
[138] Xiaosong Ma,et al. Exploiting Locality in Graph Analytics through Hardware-Accelerated Traversal Scheduling , 2018, 2018 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[139] Yue Zhao,et al. Overhead-Conscious Format Selection for SpMV-Based Applications , 2018, 2018 IEEE International Parallel and Distributed Processing Symposium (IPDPS).
[140] Ramyad Hadidi,et al. GraphPIM: Enabling Instruction-Level PIM Offloading in Graph Computing Frameworks , 2017, 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA).