Optimizing indirect memory references with milk
暂无分享,去创建一个
[1] Uzi Vishkin,et al. An O(log n) Parallel Connectivity Algorithm , 1982, J. Algorithms.
[2] Harry Berryman,et al. Run-Time Scheduling and Execution of Loops on Message Passing Machines , 1990, J. Parallel Distributed Comput..
[3] Leslie G. Valiant,et al. A bridging model for parallel computation , 1990, CACM.
[4] Jeffrey F. Naughton,et al. Cache Conscious Algorithms for Relational Query Processing , 1994, VLDB.
[5] Matteo Frigo,et al. The implementation of the Cilk-5 multithreaded language , 1998, PLDI.
[6] Chau-Wen Tseng,et al. Improving Compiler and Run-Time Support for Irregular Reductions Using Local Writes , 1998, LCPC.
[7] Ken Kennedy,et al. Improving cache performance in dynamic applications through data and computation reorganization at run time , 1999, PLDI '99.
[8] Larry Carter,et al. Localizing non-affine array references , 1999, 1999 International Conference on Parallel Architectures and Compilation Techniques (Cat. No.PR00425).
[9] Rajeev Motwani,et al. The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.
[10] L. Amaral,et al. The web of human sexual contacts , 2001, Nature.
[11] Martin L. Kersten,et al. Optimizing Main-Memory Join on Modern Hardware , 2002, IEEE Trans. Knowl. Data Eng..
[12] Peter Sanders,et al. [Delta]-stepping: a parallelizable shortest path algorithm , 2003, J. Algorithms.
[13] Ulrich Meyer,et al. [Delta]-stepping: a parallelizable shortest path algorithm , 2003, J. Algorithms.
[14] Christos Faloutsos,et al. R-MAT: A Recursive Model for Graph Mining , 2004, SDM.
[15] Mark Newman,et al. Detecting community structure in networks , 2004 .
[16] Sanjay Ghemawat,et al. MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.
[17] Vikram S. Adve,et al. LLVM: a compilation framework for lifelong program analysis & transformation , 2004, International Symposium on Code Generation and Optimization, 2004. CGO 2004..
[18] Sherry Marcus,et al. Graph-based technologies for intelligence analysis , 2004, CACM.
[19] David A. Bader,et al. On the architectural requirements for efficient execution of graph algorithms , 2005, 2005 International Conference on Parallel Processing (ICPP'05).
[20] Jimmy Su,et al. Automatic support for irregular computations in a high-level language , 2005, 19th IEEE International Parallel and Distributed Processing Symposium.
[21] Chau-Wen Tseng,et al. Exploiting locality for irregular scientific codes , 2006, IEEE Transactions on Parallel and Distributed Systems.
[22] Yogish Sabharwal,et al. Optimizing the HPCC randomaccess benchmark on blue Gene/L Supercomputer , 2006, SIGMETRICS '06/Performance '06.
[23] Pradeep Dubey,et al. Sort vs. Hash Revisited: Fast Join Implementation on Modern Multi-Core CPUs , 2009, Proc. VLDB Endow..
[24] Keshav Pingali,et al. The tao of parallelism in algorithms , 2011, PLDI '11.
[25] Peter Sanders,et al. Engineering a Multi-core Radix Sort , 2011, Euro-Par.
[26] Alfons Kemper,et al. Massively Parallel Sort-Merge Joins in Main Memory Multi-Core Database Systems , 2012, Proc. VLDB Endow..
[27] Carlos Guestrin,et al. Distributed GraphLab : A Framework for Machine Learning and Data Mining in the Cloud , 2012 .
[28] David A. Patterson,et al. Direction-optimizing Breadth-First Search , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.
[29] Guy E. Blelloch,et al. Internally deterministic parallel algorithms can be fast , 2012, PPoPP '12.
[30] Marco Rosa,et al. Four degrees of separation , 2011, WebSci '12.
[31] Gustavo Alonso,et al. Main-memory hash joins on multi-core CPUs: Tuning to the underlying hardware , 2012, 2013 IEEE 29th International Conference on Data Engineering (ICDE).
[32] Guy E. Blelloch,et al. Ligra: a lightweight graph processing framework for shared memory , 2013, PPoPP '13.
[33] Keshav Pingali,et al. A lightweight infrastructure for graph analytics , 2013, SOSP.
[34] Mahmut T. Kandemir,et al. Trading cache hit rate for memory performance , 2014, 2014 23rd International Conference on Parallel Architecture and Compilation (PACT).
[35] Pradeep Dubey,et al. Navigating the maze of graph analytics frameworks using massive graph datasets , 2014, SIGMOD Conference.
[36] Jaejin Lee,et al. Versatile and scalable parallel histogram construction , 2014, 2014 23rd International Conference on Parallel Architecture and Compilation (PACT).
[37] Jeff Chamberlain,et al. Ivy Bridge Server: A Converged Design , 2015, IEEE Micro.
[38] David A. Patterson,et al. Locality Exists in Graph Processing: Workload Characterization on an Ivy Bridge Server , 2015, 2015 IEEE International Symposium on Workload Characterization.
[39] Avery Ching,et al. One Trillion Edges: Graph Processing at Facebook-Scale , 2015, Proc. VLDB Endow..
[40] Jacob Nelson,et al. Latency-Tolerant Software Distributed Shared Memory , 2015, USENIX Annual Technical Conference.
[41] Torsten Hoefler,et al. Evaluating the Cost of Atomic Operations on Modern Architectures , 2015, 2015 International Conference on Parallel Architecture and Compilation (PACT).
[42] David A. Patterson,et al. The GAP Benchmark Suite , 2015, ArXiv.
[43] Gustavo Alonso,et al. Main-Memory Hash Joins on Modern Processor Architectures , 2015, IEEE Transactions on Knowledge and Data Engineering.