Analysis and Optimization of the Memory Hierarchy for Graph Processing Workloads
暂无分享,去创建一个
Li Zhao | Yuan Xie | Xing Hu | Shuangchen Li | Sang Min Oh | Xiaowei Jiang | Abanti Basak | Xinfeng Xie | Sangmin Oh | Shuangchen Li | Yuan Xie | Xing Hu | Abanti Basak | Xinfeng Xie | Xiaowei Jiang | Li Zhao
[1] David Blaauw,et al. Compute Caches , 2017, 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[2] Onur Mutlu,et al. Feedback Directed Prefetching: Improving the Performance and Bandwidth-Efficiency of Hardware Prefetchers , 2007, 2007 IEEE 13th International Symposium on High Performance Computer Architecture.
[3] Mohan Kumar,et al. Mosaic: Processing a Trillion-Edge Graph on a Single Machine , 2017, EuroSys.
[4] Jimmy J. Lin,et al. GraphJet: Real-Time Content Recommendations at Twitter , 2016, Proc. VLDB Endow..
[5] Jimmy Lin. Scale Up or Scale Out for Graph Processing? , 2018, IEEE Internet Computing.
[6] Aart J. C. Bik,et al. Pregel: a system for large-scale graph processing , 2010, SIGMOD Conference.
[7] Willy Zwaenepoel,et al. X-Stream: edge-centric graph processing using streaming partitions , 2013, SOSP.
[8] Dirk Grunwald,et al. Prefetching Using Markov Predictors , 1997, Conference Proceedings. The 24th Annual International Symposium on Computer Architecture.
[9] Norman P. Jouppi,et al. CACTI 6.0: A Tool to Model Large Caches , 2009 .
[10] Onur Mutlu,et al. Techniques for bandwidth-efficient prefetching of linked data structures in hybrid prefetching systems , 2009, 2009 IEEE 15th International Symposium on High Performance Computer Architecture.
[11] Joseph Gonzalez,et al. PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs , 2012, OSDI.
[12] Ruby B. Lee,et al. Random Fill Cache Architecture , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.
[13] Ramyad Hadidi,et al. GraphPIM: Enabling Instruction-Level PIM Offloading in Graph Computing Frameworks , 2017, 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[14] Jinha Kim,et al. TurboGraph: a fast parallel graph engine handling billion-scale graphs in a single PC , 2013, KDD.
[15] Kei Hiraki,et al. Access Map Pattern Matching for High Performance Data Cache Prefetch , 2011, J. Instr. Level Parallelism.
[16] Vladimir Vlassov,et al. High-Level Programming Abstractions for Distributed Graph Processing , 2016, IEEE Transactions on Knowledge and Data Engineering.
[17] James E. Smith,et al. Data Cache Prefetching Using a Global History Buffer , 2005, IEEE Micro.
[18] Guy E. Blelloch,et al. GraphChi: Large-Scale Graph Computation on Just a PC , 2012, OSDI.
[19] Feifei Li,et al. Graph Analytics Through Fine-Grained Parallelism , 2016, SIGMOD Conference.
[20] David A. Patterson,et al. The GAP Benchmark Suite , 2015, ArXiv.
[21] Hideki Ando,et al. MLP-aware dynamic instruction window resizing for adaptively exploiting both ILP and MLP , 2013, 2013 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[22] Willy Zwaenepoel,et al. Everything you always wanted to know about multicore graph processing but were afraid to ask , 2017, USENIX Annual Technical Conference.
[23] Chia-Lin Yang,et al. Push vs. pull: data movement for linked data structures , 2000, ICS '00.
[24] Haixun Wang,et al. Trinity: a distributed graph engine on a memory cloud , 2013, SIGMOD '13.
[25] Stijn Eyerman,et al. An Evaluation of High-Level Mechanistic Core Models , 2014, ACM Trans. Archit. Code Optim..
[26] Lina Sawalha,et al. ×86 computer architecture simulators: A comparative study , 2016, 2016 IEEE 34th International Conference on Computer Design (ICCD).
[27] Judy Qiu,et al. Performance Characterization of Multi-threaded Graph Processing Applications on Many-Integrated-Core Architecture , 2017, 2018 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).
[28] Srinivas Devadas,et al. IMP: Indirect memory prefetcher , 2015, 2015 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[29] Jignesh M. Patel,et al. Data prefetching by dependence graph precomputation , 2001, ISCA 2001.
[30] Rok Sosic,et al. SNAP , 2016, ACM Trans. Intell. Syst. Technol..
[31] Jure Leskovec,et al. {SNAP Datasets}: {Stanford} Large Network Dataset Collection , 2014 .
[32] Jimmy J. Lin,et al. WTF: the who to follow service at Twitter , 2013, WWW.
[33] Andreas Moshovos,et al. Dependence based prefetching for linked data structures , 1998, ASPLOS VIII.
[34] Kiyoung Choi,et al. A scalable processing-in-memory accelerator for parallel graph processing , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).
[35] David A. Patterson,et al. Locality Exists in Graph Processing: Workload Characterization on an Ivy Bridge Server , 2015, 2015 IEEE International Symposium on Workload Characterization.
[36] Ching-Yung Lin,et al. GraphBIG: understanding graph computing in the context of industrial solutions , 2015, SC15: International Conference for High Performance Computing, Networking, Storage and Analysis.
[37] Onur Mutlu,et al. Gather-Scatter DRAM: In-DRAM address translation to improve the spatial locality of non-unit strided accesses , 2015, 2015 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[38] Brian Fahs,et al. Microarchitecture optimizations for exploiting memory-level parallelism , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..
[39] Jinchun Kim,et al. Path confidence based lookahead prefetching , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[40] Onur Mutlu,et al. Accelerating Dependent Cache Misses with an Enhanced Memory Controller , 2016, ISCA.
[41] Avery Ching,et al. One Trillion Edges: Graph Processing at Facebook-Scale , 2015, Proc. VLDB Endow..
[42] Ozcan Ozturk,et al. Energy Efficient Architecture for Graph Analytics Accelerators , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).
[43] Kang Chen,et al. Wonderland: A Novel Abstraction-Based Out-Of-Core Graph Processing System , 2018, ASPLOS.
[44] Lieven Eeckhout,et al. Using cycle stacks to understand scaling bottlenecks in multi-threaded workloads , 2011, 2011 IEEE International Symposium on Workload Characterization (IISWC).
[45] Dirk Grunwald,et al. A stateless, content-directed data prefetching mechanism , 2002, ASPLOS X.
[46] Guy E. Blelloch,et al. Ligra: a lightweight graph processing framework for shared memory , 2013, PPoPP '13.
[47] Jure Leskovec,et al. Pixie: A System for Recommending 3+ Billion Items to 200+ Million Users in Real-Time , 2017, WWW.
[48] Christopher J. Hughes,et al. Memory-side prefetching for linked data structures for processor-in-memory systems , 2005, J. Parallel Distributed Comput..
[49] Tianshi Chen,et al. TuNao: A High-Performance and Energy-Efficient Reconfigurable Accelerator for Graph Processing , 2017, 2017 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID).
[50] Yiran Chen,et al. GraphR: Accelerating Graph Processing Using ReRAM , 2017, 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[51] Onur Mutlu,et al. Prefetch-Aware DRAM Controllers , 2008, 2008 41st IEEE/ACM International Symposium on Microarchitecture.
[52] Yuan Xie,et al. Exploring Core and Cache Hierarchy Bottlenecks in Graph Processing Workloads , 2018, IEEE Computer Architecture Letters.
[53] Mahmut T. Kandemir,et al. Meeting midway: Improving CMP performance with memory-side prefetching , 2013, Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques.
[54] Onur Mutlu,et al. Runahead execution: an alternative to very large instruction windows for out-of-order processors , 2003, The Ninth International Symposium on High-Performance Computer Architecture, 2003. HPCA-9 2003. Proceedings..
[55] Lieven Eeckhout,et al. Sniper: scalable and accurate parallel multi-core simulation , 2012 .
[56] Carlos Guestrin,et al. Distributed GraphLab : A Framework for Machine Learning and Data Mining in the Cloud , 2012 .
[57] Pradeep Dubey,et al. Navigating the maze of graph analytics frameworks using massive graph datasets , 2014, SIGMOD Conference.
[58] Margaret Martonosi,et al. Graphicionado: A high-performance and energy-efficient accelerator for graph analytics , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[59] Wenguang Chen,et al. GridGraph: Large-Scale Graph Processing on a Single Machine Using 2-Level Hierarchical Partitioning , 2015, USENIX ATC.
[60] Pararth Shah,et al. Ringo: Interactive Graph Analytics on Big-Memory Machines , 2015, SIGMOD Conference.
[61] Sachin Katti,et al. Parallel Graph Processing on Modern Multi-core Servers: New Findings and Remaining Challenges , 2016, 2016 IEEE 24th International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems (MASCOTS).
[62] Seth H. Pugsley,et al. Efficiently prefetching complex address patterns , 2015, 2015 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[63] Paolo Faraboschi,et al. Parallel Graph Processing: Prejudice and State of the Art , 2016, ICPE.
[64] Pierre Michaud. Best-offset hardware prefetching , 2016, 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[65] Sam Ainsworth,et al. Graph Prefetching Using Data Structure Knowledge , 2016, ICS.