A scalable processing-in-memory accelerator for parallel graph processing
暂无分享,去创建一个
Kiyoung Choi | Onur Mutlu | Sungpack Hong | Sungjoo Yoo | Junwhan Ahn | O. Mutlu | Junwhan Ahn | S. Yoo | Kiyoung Choi | Sungpack Hong
[1] David Kroft,et al. Lockup-free instruction fetch/prefetch cache organization , 1998, ISCA '81.
[2] Shirish Tatikonda,et al. From "Think Like a Vertex" to "Think Like a Graph" , 2013, Proc. VLDB Endow..
[3] Seung-Moon Yoo,et al. FlexRAM: Toward an advanced Intelligent Memory system , 1999, 2012 IEEE 30th International Conference on Computer Design (ICCD).
[4] Kunle Olukotun,et al. Efficient Parallel Graph Exploration on Multi-Core CPU and GPU , 2011, 2011 International Conference on Parallel Architectures and Compilation Techniques.
[5] Gabriel H. Loh,et al. 3D-Stacked Memory Architectures for Multi-core Processors , 2008, 2008 International Symposium on Computer Architecture.
[6] Onur Mutlu,et al. Feedback Directed Prefetching: Improving the Performance and Bandwidth-Efficiency of Hardware Prefetchers , 2007, 2007 IEEE 13th International Symposium on High Performance Computer Architecture.
[7] Peter M. Kogge,et al. EXECUBE-A New Architecture for Scaleable MPPs , 1994, 1994 International Conference on Parallel Processing Vol. 1.
[8] Andrew S. Grimshaw,et al. Scalable GPU graph traversal , 2012, PPoPP '12.
[9] Rajeev Balasubramonian,et al. Quantifying the relationship between the power delivery network and architectural policies in a 3D-stacked memory device , 2013, 2013 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[10] Mehrzad Samadi,et al. Memory-centric system interconnect design with hybrid memory cubes , 2013, PACT 2013.
[11] Thomas F. Wenisch,et al. Thin servers with smart pipes: designing SoC accelerators for memcached , 2013, ISCA.
[12] Mike Ignatowski,et al. TOP-PIM: throughput-oriented programmable processing in memory , 2014, HPDC '14.
[13] Kunle Olukotun,et al. Simplifying Scalable Graph Processing with a Domain-Specific Language , 2014, CGO '14.
[14] Gabriel H. Loh,et al. Thermal Feasibility of Die-Stacked Processing in Memory , 2014 .
[15] Kenneth A. Ross,et al. Navigating big data with high-throughput, energy-efficient data partitioning , 2013, ISCA.
[16] Parthasarathy Ranganathan,et al. From Microprocessors to Nanostores: Rethinking Data-Centric Systems , 2011, Computer.
[17] Sergey Brin,et al. The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.
[18] Kunle Olukotun,et al. Green-Marl: a DSL for easy and efficient graph analysis , 2012, ASPLOS XVII.
[19] L. Dagum,et al. OpenMP: an industry standard API for shared-memory programming , 1998 .
[20] Onur Mutlu,et al. Accelerating critical section execution with asymmetric multi-core architectures , 2009, ASPLOS.
[21] Aart J. C. Bik,et al. Pregel: a system for large-scale graph processing , 2010, SIGMOD Conference.
[22] Ronald G. Dreslinski,et al. Integrated 3D-stacked server designs for increasing physical density of key-value stores , 2014, ASPLOS.
[23] Parag Agrawal,et al. The case for RAMClouds: scalable high-performance storage entirely in DRAM , 2010, OPSR.
[24] Nancy M. Amato,et al. KLA: A new algorithmic paradigm for parallel graph computations , 2014, 2014 23rd International Conference on Parallel Architecture and Compilation (PACT).
[25] Christopher J. Hughes,et al. Memory-side prefetching for linked data structures for processor-in-memory systems , 2005, J. Parallel Distributed Comput..
[26] Krisztián Flautner,et al. PicoServer: using 3D stacking technology to enable a compact energy efficient chip multiprocessor , 2006, ASPLOS XII.
[27] Babak Falsafi,et al. Meet the walkers accelerating index traversals for in-memory databases , 2013, 2013 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[28] P. J. Narayanan,et al. Accelerating Large Graph Algorithms on the GPU Using CUDA , 2007, HiPC.
[29] Frederic T. Chong,et al. Active pages: a computation model for intelligent memory , 1998, ISCA.
[30] Jaeha Kim,et al. Memory-centric system interconnect design with Hybrid Memory Cubes , 2013, Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques.
[31] Jennifer Widom,et al. GPS: a graph processing system , 2013, SSDBM.
[32] Gabriel H. Loh Nuwan Jayasena Mark H. Oskin Mark Nutter Da Ignatowski. A Processing-in-Memory Taxonomy and a Case for Studying Fixed-function PIM , 2013 .
[33] Joseph Gonzalez,et al. PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs , 2012, OSDI.
[34] Jean-Loup Baer,et al. Effective Hardware Based Data Prefetching for High-Performance Processors , 1995, IEEE Trans. Computers.
[35] Andrew Birrell,et al. Implementing remote procedure calls , 1984, TOCS.
[36] Vipin Kumar,et al. A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs , 1998, SIAM J. Sci. Comput..
[37] Chia-Lin Yang,et al. Push vs. pull: data movement for linked data structures , 2000, ICS '00.
[38] Christoforos E. Kozyrakis,et al. Convolution engine: balancing efficiency & flexibility in specialized computing , 2013, ISCA.
[39] Josep Torrellas,et al. Using a user-level memory thread for correlation prefetching , 2002, ISCA.
[40] Maya Gokhale,et al. Processing in Memory: The Terasys Massively Parallel PIM Array , 1995, Computer.
[41] Rachata Ausavarungnirun,et al. RowClone: Fast and energy-efficient in-DRAM bulk data copy and initialization , 2013, 2013 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[42] Harish Patil,et al. Pin: building customized program analysis tools with dynamic instrumentation , 2005, PLDI '05.
[43] Babak Falsafi,et al. Clearing the clouds: a study of emerging scale-out workloads on modern hardware , 2012, ASPLOS XVII.
[44] Norman P. Jouppi,et al. Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers , 1990, [1990] Proceedings. The 17th Annual International Symposium on Computer Architecture.
[45] Kunle Olukotun,et al. Accelerating CUDA graph algorithms at maximum warp , 2011, PPoPP '11.
[46] Eric S. Chung,et al. LINQits: big data on little clients , 2013, ISCA.
[47] J. Jeddeloh,et al. Hybrid memory cube new DRAM architecture increases density and performance , 2012, 2012 Symposium on VLSI Technology (VLSIT).
[48] Franz Franchetti,et al. A 3D-stacked logic-in-memory accelerator for application-specific data intensive computing , 2013, 2013 IEEE International 3D Systems Integration Conference (3DIC).
[49] Steven Swanson,et al. Near-Data Processing: Insights from a MICRO-46 Workshop , 2014, IEEE Micro.
[50] Krishna P. Gummadi,et al. Measurement and analysis of online social networks , 2007, IMC '07.
[51] Onur Mutlu,et al. Runahead execution: an alternative to very large instruction windows for out-of-order processors , 2003, The Ninth International Symposium on High-Performance Computer Architecture, 2003. HPCA-9 2003. Proceedings..
[52] Carlos Guestrin,et al. Distributed GraphLab : A Framework for Machine Learning and Data Mining in the Cloud , 2012 .
[53] Jaewook Shin,et al. Mapping Irregular Applications to DIVA, a PIM-based Data-Intensive Architecture , 1999, ACM/IEEE SC 1999 Conference (SC'99).
[54] K. Yelick,et al. Intelligent RAM (IRAM): chips that remember and compute , 1997, 1997 IEEE International Solids-State Circuits Conference. Digest of Technical Papers.
[55] Franz Franchetti,et al. Accelerating sparse matrix-matrix multiplication with 3D-stacked logic-in-memory hardware , 2013, 2013 IEEE High Performance Extreme Computing Conference (HPEC).
[56] Michael M. Swift,et al. Efficient virtual memory for big memory servers , 2013, ISCA.
[57] Feifei Li,et al. NDC: Analyzing the impact of 3D-stacked memory+logic devices on MapReduce workloads , 2014, 2014 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).