Substream-Centric Maximum Matchings on FPGA
暂无分享,去创建一个
Torsten Hoefler | Johannes de Fine Licht | Marc Fischer | Maciej Besta | Dimitri Stanojevic | Tal Ben-Nun | T. Hoefler | J. D. F. Licht | Maciej Besta | Tal Ben-Nun | D. Stanojevic | Marc Fischer
[1] Yu Wang,et al. ForeGraph: Exploring Large-scale Graph Processing on Multi-FPGA Architecture , 2017, FPGA.
[2] Torsten Hoefler,et al. Flexible Communication Avoiding Matrix Multiplication on FPGA with High-Level Synthesis , 2019, FPGA.
[3] Mark E. J. Newman. A measure of betweenness centrality based on random walks , 2005, Soc. Networks.
[4] Graham Cormode,et al. The Sparse Awakens: Streaming Algorithms for Matching Size Estimation in Sparse Graphs , 2016, ESA.
[5] Torsten Hoefler,et al. Survey and Taxonomy of Lossless Graph Compression and Space-Efficient Graph Representations , 2018, ArXiv.
[6] Hayden Kwok-Hay So,et al. GraVF: A vertex-centric distributed graph processing framework on FPGAs , 2016, 2016 26th International Conference on Field Programmable Logic and Applications (FPL).
[7] Dejan Markovic,et al. A scalable sparse matrix-vector multiplication kernel for energy-efficient sparse-blas on FPGAs , 2014, FPGA.
[8] Douglas J. Klein,et al. On some solved and unsolved problems of chemical graph theory , 1986 .
[9] Torsten Hoefler,et al. Active Access: A Mechanism for High-Performance Distributed Data-Centric Computations , 2015, ICS.
[10] Viktor K. Prasanna,et al. Accelerating Graph Analytics on CPU-FPGA Heterogeneous Platform , 2017, 2017 29th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD).
[11] Elkin Garcia,et al. A Reconfigurable Computing System Based on a Cache-Coherent Fabric , 2011, 2011 International Conference on Reconfigurable Computing and FPGAs.
[12] Yu Wang,et al. Parallel FPGA-based all pairs shortest paths for sparse networks: A human brain connectome case study , 2012, 22nd International Conference on Field Programmable Logic and Applications (FPL).
[13] Jing Li,et al. Degree-aware Hybrid Graph Traversal on FPGA-HMC Platform , 2018, FPGA.
[14] Nanning Zheng,et al. Stereo Matching Using Belief Propagation , 2002, IEEE Trans. Pattern Anal. Mach. Intell..
[15] Sudipto Guha,et al. Linear programming in the semi-streaming model with application to the maximum matching problem , 2011, Inf. Comput..
[16] Sanjeev Arora,et al. The Multiplicative Weights Update Method: a Meta-Algorithm and Applications , 2012, Theory Comput..
[17] Luca Benini,et al. Network-accelerated non-contiguous memory transfers , 2019, SC.
[18] James C. Hoe,et al. GraphGen for CoRAM : Graph Computation on FPGAs , 2013 .
[19] Samson Zhou,et al. Streaming Weighted Matchings: Optimal Meets Greedy , 2016, ArXiv.
[20] A. Kemper,et al. On Graph Problems in a Semi-streaming Model , 2015 .
[21] Sofya Vorotnikova,et al. Planar Matching in Streams Revisited , 2016, APPROX-RANDOM.
[22] Tianshi Chen,et al. TuNao: A High-Performance and Energy-Efficient Reconfigurable Accelerator for Graph Processing , 2017, 2017 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID).
[23] Avery Ching,et al. One Trillion Edges: Graph Processing at Facebook-Scale , 2015, Proc. VLDB Endow..
[24] Ashish Goel,et al. On the communication and streaming complexity of maximum bipartite matching , 2012, SODA.
[25] Mayur Datar,et al. On the streaming model augmented with a sorting primitive , 2004, 45th Annual IEEE Symposium on Foundations of Computer Science.
[26] Mariano Zelke,et al. Weighted Matching in the Semi-Streaming Model , 2007, Algorithmica.
[27] Torsten Hoefler,et al. Evaluating the Cost of Atomic Operations on Modern Architectures , 2015, 2015 International Conference on Parallel Architecture and Compilation (PACT).
[28] Hayden Kwok-Hay So,et al. Vertex-Centric Graph Processing on FPGA , 2016, 2016 IEEE 24th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM).
[29] Phillip H. Jones,et al. CyGraph: A Reconfigurable Architecture for Parallel Breadth-First Search , 2014, 2014 IEEE International Parallel & Distributed Processing Symposium Workshops.
[30] Reynold Xin,et al. Apache Spark , 2016 .
[31] Reuven Bar-Yehuda,et al. A unified approach to approximating resource allocation and scheduling , 2001, JACM.
[32] Torsten Hoefler,et al. Fault tolerance for remote memory access programming models , 2014, HPDC '14.
[33] Hossam A. ElGindy,et al. On sparse matrix-vector multiplication with FPGA-based system , 2002, Proceedings. 10th Annual IEEE Symposium on Field-Programmable Custom Computing Machines.
[34] Kenneth Steiglitz,et al. Combinatorial Optimization: Algorithms and Complexity , 1981 .
[35] Torsten Hoefler,et al. Practice of Streaming and Dynamic Graphs: Concepts, Models, Systems, and Parallelism , 2019, ArXiv.
[36] Torsten Hoefler,et al. Demystifying Graph Databases: Analysis and Taxonomy of Data Organization, System Designs, and Graph Queries , 2019, ArXiv.
[37] Jure Leskovec,et al. {SNAP Datasets}: {Stanford} Large Network Dataset Collection , 2014 .
[38] Richard M. Karp,et al. An optimal algorithm for on-line bipartite matching , 1990, STOC '90.
[39] Torsten Hoefler,et al. SlimSell: A Vectorizable Graph Representation for Breadth-First Search , 2017, 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS).
[40] S. Muthukrishnan,et al. Data streams: algorithms and applications , 2005, SODA '03.
[41] James C. Hoe,et al. GraphGen: An FPGA Framework for Vertex-Centric Graph Computation , 2014, 2014 IEEE 22nd Annual International Symposium on Field-Programmable Custom Computing Machines.
[42] Kiyoung Choi,et al. A scalable processing-in-memory accelerator for parallel graph processing , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).
[43] Viktor K. Prasanna,et al. High-Throughput and Energy-Efficient Graph Processing on FPGA , 2016, 2016 IEEE 24th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM).
[44] Pengcheng Yao,et al. An efficient graph accelerator with parallel data conflict management , 2018, PACT.
[45] Yu Wang,et al. FPGP: Graph Processing Framework on FPGA A Case Study of Breadth-First Search , 2016, FPGA.
[46] Gustavo Alonso,et al. Accelerating Pattern Matching Queries in Hybrid CPU-FPGA Architectures , 2017, SIGMOD Conference.
[47] Rajeev Motwani,et al. The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.
[48] Thomas Schank,et al. Algorithmic Aspects of Triangle-Based Network Analysis , 2007 .
[49] Torsten Hoefler,et al. Red-blue pebbling revisited: near optimal parallel matrix-matrix multiplication , 2019, SC.
[50] Torsten Hoefler,et al. Communication-avoiding parallel minimum cuts and connected components , 2018, PPoPP.
[51] Jing Li,et al. Accelerating Graph Analytics by Co-Optimizing Storage and Access on an FPGA-HMC Platform , 2018, FPGA.
[52] Torsten Hoefler,et al. High-Performance Distributed RMA Locks , 2016, HPDC.
[53] Sofya Vorotnikova,et al. Kernelization via Sampling with Applications to Finding Matchings and Related Problems in Dynamic Graph Streams , 2016, SODA.
[54] F. Massey,et al. Introduction to Statistical Analysis , 1970 .
[55] Torsten Hoefler,et al. Scientific Benchmarking of Parallel Computing Systems Twelve ways to tell the masses when reporting performance results , 2017 .
[56] Yong Dou,et al. An FPGA Implementation for Solving the Large Single-Source-Shortest-Path Problem , 2016, IEEE Transactions on Circuits and Systems II: Express Briefs.
[57] Aart J. C. Bik,et al. Pregel: a system for large-scale graph processing , 2010, SIGMOD Conference.
[58] Kiyoung Choi,et al. ExtraV: Boosting Graph Processing Near Storage with a Coherent Accelerator , 2017, Proc. VLDB Endow..
[59] Torsten Hoefler,et al. A Modular Benchmarking Infrastructure for High-Performance and Reproducible Deep Learning , 2019, 2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS).
[60] Jennifer Widom,et al. Optimizing Graph Algorithms on Pregel-like Systems , 2014, Proc. VLDB Endow..
[61] Joseph Gonzalez,et al. PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs , 2012, OSDI.
[62] Jing Li,et al. Boosting the Performance of FPGA-based Graph Processor using Hybrid Memory Cube: A Case for Breadth First Search , 2017, FPGA.
[63] Nachiket Kapre. Custom FPGA-based soft-processors for sparse graph acceleration , 2015, 2015 IEEE 26th International Conference on Application-specific Systems, Architectures and Processors (ASAP).
[64] Gustavo Alonso,et al. Centaur: A Framework for Hybrid CPU-FPGA Databases , 2017, 2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM).
[65] Sanjay Ghemawat,et al. MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.
[66] Viktor K. Prasanna,et al. Optimizing memory performance for FPGA implementation of pagerank , 2015, 2015 International Conference on ReConFigurable Computing and FPGAs (ReConFig).
[67] Sudipto Guha,et al. Analyzing graph structure via linear measurements , 2012, SODA.
[68] Klaus Jansen,et al. Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques , 2006, Lecture Notes in Computer Science.
[69] Torsten Hoefler,et al. Accelerating Irregular Computations with Hardware Transactional Memory and Active Messages , 2015, HPDC.
[70] Sofya Vorotnikova,et al. A Simple, Space-Efficient, Streaming Algorithm for Matchings in Low Arboricity Graphs , 2018, SOSA@SODA.
[71] Jim Stevens,et al. Run-Time Services for Hybrid CPU/FPGA Systems on Chip , 2006, 2006 27th IEEE International Real-Time Systems Symposium (RTSS'06).
[72] Magnus Jahre,et al. Hybrid breadth-first search on a single-chip FPGA-CPU heterogeneous platform , 2015, 2015 25th International Conference on Field Programmable Logic and Applications (FPL).
[73] Willy Zwaenepoel,et al. X-Stream: edge-centric graph processing using streaming partitions , 2013, SOSP.
[74] Axel Jantsch,et al. Buffer minimization of real-time streaming applications scheduling on hybrid CPU/FPGA architectures , 2009, 2009 Design, Automation & Test in Europe Conference & Exhibition.
[75] Sofya Vorotnikova,et al. Better Algorithms for Counting Triangles in Data Streams , 2016, PODS.
[76] Derek Chiou,et al. FPGA-Accelerated Transactional Execution of Graph Workloads , 2017, FPGA.
[77] Viktor K. Prasanna,et al. An FPGA framework for edge-centric graph processing , 2018, CF.
[78] Torsten Hoefler,et al. Slim NoC: A Low-Diameter On-Chip Network Topology for High Energy Efficiency and Scalability , 2018, ASPLOS.
[79] Torsten Hoefler,et al. Slim graph: practical lossy graph compression for approximate graph processing, storage, and analytics , 2019, SC.
[80] Uzi Vishkin,et al. An O(log n) Parallel Connectivity Algorithm , 1982, J. Algorithms.
[81] Xin-She Yang,et al. Introduction to Algorithms , 2021, Nature-Inspired Optimization Algorithms.
[82] Mikhail Kapralov,et al. Better bounds for matchings in the streaming model , 2012, SODA.
[83] Yao-Wen Chang,et al. Graph matching-based algorithms for FPGA segmentation design , 1998, 1998 IEEE/ACM International Conference on Computer-Aided Design. Digest of Technical Papers (IEEE Cat. No.98CB36287).
[84] Li Shang,et al. Dynamic power consumption in Virtex™-II FPGA family , 2002, FPGA '02.
[85] Hugo Liu,et al. ConceptNet — A Practical Commonsense Reasoning Tool-Kit , 2004 .
[86] Arijit Khan. Vertex-Centric Graph Processing: Good, Bad, and the Ugly , 2017, EDBT.
[87] Clifford Stein,et al. Introduction to Algorithms, 2nd edition. , 2001 .
[88] Claire Mathieu,et al. Maximum Matching in Semi-streaming with Few Passes , 2011, APPROX-RANDOM.
[89] Torsten Hoefler,et al. Scaling Betweenness Centrality using Communication-Efficient Sparse Matrix Multiplication , 2016, SC17: International Conference for High Performance Computing, Networking, Storage and Analysis.
[90] MutluOnur,et al. A scalable processing-in-memory accelerator for parallel graph processing , 2015 .
[91] Wilfred Ng,et al. Pregel Algorithms for Graph Connectivity Problems with Performance Guarantees , 2014, Proc. VLDB Endow..
[92] Kunle Olukotun,et al. GraphOps: A Dataflow Library for Graph Analytics Acceleration , 2016, FPGA.
[93] Michael Isard,et al. Scalability! But at what COST? , 2015, HotOS.
[94] Richard Szeliski,et al. A Comparative Study of Energy Minimization Methods for Markov Random Fields with Smoothness-Based Priors , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[95] Ami Paz,et al. A (2 + ∊)-Approximation for Maximum Weight Matching in the Semi-Streaming Model , 2017, SODA.
[96] Dror Rawitz,et al. Local ratio: A unified framework for approximation algorithms. In Memoriam: Shimon Even 1935-2004 , 2004, CSUR.
[97] Martin Langhammer,et al. Arria™ 10 device architecture , 2015, 2015 IEEE Custom Integrated Circuits Conference (CICC).
[98] John Shalf,et al. Programming Abstractions for Data Locality , 2014 .
[99] Taieb Znati,et al. Algorithmic Aspects of Wireless Networks , 2007, EURASIP J. Wirel. Commun. Netw..
[100] Guy E. Blelloch,et al. GraphChi: Large-Scale Graph Computation on Just a PC , 2012, OSDI.
[101] Jeff Mason,et al. CHiMPS: A C-level compilation flow for hybrid CPU-FPGA architectures , 2008, 2008 International Conference on Field Programmable Logic and Applications.
[102] Camil Demetrescu,et al. Trading off space for passes in graph streaming problems , 2009, SODA '06.
[103] Gregory D. Peterson,et al. Sparse Matrix-Vector Multiplication Design on FPGAs , 2007, 15th Annual IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM 2007).
[104] Christian Sohler,et al. Counting triangles in data streams , 2006, PODS.
[105] Chengbo Yang,et al. An Efficient Dispatcher for Large Scale GraphProcessing on OpenCL-based FPGAs , 2018, ArXiv.
[106] Torsten Hoefler,et al. Graph Processing on FPGAs: Taxonomy, Survey, Challenges , 2019, ArXiv.
[107] Franz Franchetti,et al. Mathematical foundations of the GraphBLAS , 2016, 2016 IEEE High Performance Extreme Computing Conference (HPEC).
[108] Torsten Hoefler,et al. Transformations of High-Level Synthesis Codes for High-Performance Computing , 2018, IEEE Transactions on Parallel and Distributed Systems.
[109] Michael Crouch,et al. Improved Streaming Algorithms for Weighted Matching, via Unweighted Matching , 2014, APPROX-RANDOM.
[110] Prabhakar Raghavan,et al. Computing on data streams , 1999, External Memory Algorithms.
[111] Gunnar Rätsch,et al. Communication-Efficient Jaccard similarity for High-Performance Distributed Genome Comparisons , 2019, 2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS).
[112] Sanjeev Khanna,et al. Approximating matching size from random streams , 2014, SODA.
[113] Reuven Bar-Yehuda,et al. A Local-Ratio Theorem for Approximating the Weighted Vertex Cover Problem , 1983, WG.
[114] Leah Epstein,et al. Improved Approximation Guarantees for Weighted Matching in the Semi-streaming Model , 2009, SIAM J. Discret. Math..
[115] Torsten Hoefler,et al. To Push or To Pull: On Reducing Communication and Synchronization in Graph Computations , 2017, HPDC.
[116] Charu C. Aggarwal,et al. Evolutionary Network Analysis , 2014, ACM Comput. Surv..
[117] Graham Cormode,et al. Annotations in Data Streams , 2009, ICALP.
[118] Tim J. Harris,et al. A survey of PRAM simulation techniques , 1994, CSUR.
[119] David L. Andrews,et al. Extending the thread programming model across cpu and fpga hybrid architectures , 2005 .
[120] Peter J. Ashenden,et al. Programming models for hybrid CPU/FPGA chips , 2004, Computer.
[121] Pascal Benoit,et al. Run-time mapping and communication strategies for Homogeneous NoC-Based MPSoCs , 2007 .
[122] Wayne Luk,et al. A framework for FPGA acceleration of large graph problems: Graphlet counting case study , 2011, 2011 International Conference on Field-Programmable Technology.
[123] Shaoli Liu,et al. Cambricon-X: An accelerator for sparse neural networks , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[124] Nachiket Kapre,et al. GraphStep: A System Architecture for Sparse-Graph Algorithms , 2006, 2006 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines.
[125] Andrew McGregor,et al. Finding Graph Matchings in Data Streams , 2005, APPROX-RANDOM.
[126] AngryCalc. GeForce GTX 1080 Ti , 2018 .
[127] Peter J. Ashenden,et al. Programming models for hybrid FPGA-cpu computational components: a missing link , 2004, IEEE Micro.
[128] Ra Inta,et al. The "Chimera": An Off-The-Shelf CPU/GPGPU/FPGA Hybrid Computing Platform , 2012, Int. J. Reconfigurable Comput..
[129] Zhi-Zhong Chen,et al. Parallel approximation algorithms for maximum weighted matching in general graphs , 2000, Inf. Process. Lett..
[130] Viktor K. Prasanna,et al. Sparse Matrix-Vector multiplication on FPGAs , 2005, FPGA '05.
[131] Yogesh L. Simmhan,et al. GoFFish: A Sub-graph Centric Framework for Large-Scale Graph Analytics , 2013, Euro-Par.
[132] Ozcan Ozturk,et al. Energy Efficient Architecture for Graph Analytics Accelerators , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).
[133] Yang Li,et al. Maximum Matchings in Dynamic Graph Streams and the Simultaneous Communication Model , 2016, SODA.
[134] Piotr Indyk,et al. Maintaining Stream Statistics over Sliding Windows , 2002, SIAM J. Comput..
[135] Joseph M. Hellerstein,et al. GraphLab: A New Framework For Parallel Machine Learning , 2010, UAI.
[136] Torsten Hoefler,et al. Enabling highly-scalable remote memory access programming with MPI-3 one sided , 2013, 2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC).
[137] Sotirios G. Ziavras,et al. Performance-Energy Tradeoffs for Matrix Multiplication on FPGA-Based Mixed-Mode Chip Multiprocessors , 2007, 8th International Symposium on Quality Electronic Design (ISQED'07).
[138] Aranyak Mehta,et al. Online bipartite matching with unknown distributions , 2011, STOC '11.
[139] Yu Wang,et al. A Reconfigurable Computing Approach for Efficient and Scalable Parallel Graph Exploration , 2012, 2012 IEEE 23rd International Conference on Application-Specific Systems, Architectures and Processors.
[140] Sudipto Guha,et al. Graph sketches: sparsification, spanners, and subgraphs , 2012, PODS.
[141] Torsten Hoefler,et al. Log(graph): a near-optimal high-performance graph representation , 2018, PACT.
[142] Torsten Hoefler,et al. Substream-Centric Maximum Matchings on FPGA , 2019, FPGA.
[143] Jason Cong,et al. A quantitative analysis on microarchitectures of modern CPU-FPGA platforms , 2016, 2016 53nd ACM/EDAC/IEEE Design Automation Conference (DAC).
[144] Graham Cormode,et al. Independent Sets in Vertex-Arrival Streams , 2018, ICALP.