论文信息 - Algorithmic and Software System Support to Accelerate Data Processing in CPU-GPU Hybrid Computing Environments

Algorithmic and Software System Support to Accelerate Data Processing in CPU-GPU Hybrid Computing Environments

.............................................................................................................................. ii Dedication .......................................................................................................................... iv Acknowledgments............................................................................................................... v Vita ..................................................................................................................................... vi Publications ........................................................................................................................ vi Fields of Study .................................................................................................................. vii Table of

Kaibo Wang | Kaibo Wang

[1] Jie Cheng,et al. Programming Massively Parallel Processors. A Hands-on Approach , 2010, Scalable Comput. Pract. Exp..

[2] Xiaoning Ding,et al. BWS: balanced work stealing for time-sharing multicores , 2012, EuroSys '12.

[3] Jun Kong,et al. A data model and database for high-resolution pathology analytical image informatics , 2011, Journal of pathology informatics.

[4] Vanish Talwar,et al. Pegasus: Coordinated Scheduling for Virtualized Accelerator-based Systems , 2011, USENIX Annual Technical Conference.

[5] Pradeep Dubey,et al. Debunking the 100X GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU , 2010, ISCA.

[6] Anastasia Ailamaki,et al. QPipe: a simultaneously pipelined relational query engine , 2005, SIGMOD '05.

[7] Juan Pineda,et al. A parallel algorithm for polygon rasterization , 1988, SIGGRAPH.

[8] Jun Kong,et al. Integrated morphologic analysis for the identification and characterization of disease subtypes , 2012, J. Am. Medical Informatics Assoc..

[9] Bingsheng He,et al. High-Throughput Transaction Executions on Graphics Processors , 2011, Proc. VLDB Endow..

[10] Daniel T. Larose,et al. Discovering Knowledge in Data: An Introduction to Data Mining , 2005 .

[11] Dawson R. Engler,et al. Exterminate all operating system abstractions , 1995, Proceedings 5th Workshop on Hot Topics in Operating Systems (HotOS-V).

[12] Sudhakar Yalamanchili,et al. Kernel Weaver: Automatically Fusing Database Primitives for Efficient GPU Computation , 2012, 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture.

[13] Aaftab Munshi,et al. The OpenCL specification , 2009, 2009 IEEE Hot Chips 21 Symposium (HCS).

[14] Kevin Skadron,et al. Rodinia: A benchmark suite for heterogeneous computing , 2009, 2009 IEEE International Symposium on Workload Characterization (IISWC).

[15] Seungyeop Han,et al. SSLShader: Cheap SSL Acceleration with Commodity Processors , 2011, NSDI.

[16] Peter Benjamin Volk,et al. GPU join processing revisited , 2012, DaMoN '12.

[17] James Demmel,et al. the Parallel Computing Landscape , 2022 .

[18] Yuan Yuan,et al. The Yin and Yang of Processing Data Warehousing Queries on GPU Devices , 2013, Proc. VLDB Endow..

[19] Hyesoon Kim. Supporting virtual memory in GPGPU without supporting precise exceptions , 2012, MSPC '12.

[20] Bingsheng He,et al. Relational joins on graphics processors , 2008, SIGMOD Conference.

[21] Peter J. Denning,et al. Third Generation Computer Systems , 1971, CSUR.

[22] John E. Stone,et al. An asymmetric distributed shared memory model for heterogeneous parallel systems , 2010, ASPLOS XV.

[23] Feng Ji,et al. RSVM: A Region-based Software Virtual Memory for GPU , 2013, Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques.

[24] Divyakant Agrawal,et al. Hardware Acceleration in Commercial Databases: A Case Study of Spatial Operations , 2004, VLDB.

[25] Christos Faloutsos,et al. Hilbert R-tree: An Improved R-tree using Fractals , 1994, VLDB.

[26] Mark de Berg,et al. Computational geometry: algorithms and applications , 1997 .

[27] Peter J. Denning,et al. Virtual memory , 1970, CSUR.

[28] Dinesh Manocha,et al. Fast computation of database operations using graphics processors , 2005, SIGGRAPH Courses.

[29] Sebastian Breß,et al. Why it is time for a HyPE: A Hybrid Query Processing Engine for Efficient GPU Coprocessing in DBMS , 2013, Proc. VLDB Endow..

[30] Mohan S. Kankanhalli,et al. Calculating the Area of Overlaid Polygons Without Constructing the Overlay , 1994 .

[31] Ronald L. Wasserstein,et al. Monte Carlo: Concepts, Algorithms, and Applications , 1997 .

[32] John Poulton. An embedded DRAM for CMOS ASICs , 1997, Proceedings Seventeenth Conference on Advanced Research in VLSI.

[33] Joseph O'Rourke,et al. Computational Geometry in C. , 1995 .

[34] Idit Keidar,et al. GPUfs: Integrating a file system with GPUs , 2013, TOCS.

[35] Shinpei Kato,et al. TimeGraph: GPU Scheduling for Real-Time Multi-Tasking Environments , 2011, USENIX Annual Technical Conference.

[36] Martin L. Kersten,et al. The researcher's guide to the data deluge , 2011, Proc. VLDB Endow..

[37] David J. DeWitt,et al. Building a scaleable geo-spatial DBMS: technology, implementation, and evaluation , 1997, SIGMOD '97.

[38] Gang Wang,et al. Efficient Parallel Lists Intersection and Index Compression Algorithms using Graphics Processing Units , 2011, Proc. VLDB Endow..

[39] Lei Jiang,et al. Die Stacking (3D) Microarchitecture , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).

[40] Michael R. Macedonia,et al. The GPU Enters Computing's Mainstream , 2003, Computer.

[41] Shinpei Kato,et al. GDM: device memory management for gpgpu computing , 2014, SIGMETRICS '14.

[42] Fusheng Wang,et al. YSmart: Yet Another SQL-to-MapReduce Translator , 2011, 2011 31st International Conference on Distributed Computing Systems.

[43] Wen-mei W. Hwu,et al. Optimization principles and application performance evaluation of a multithreaded GPU using CUDA , 2008, PPoPP.

[44] Karthikeyan Sankaralingam,et al. iGPU: Exception support and speculative execution on GPUs , 2012, 2012 39th Annual International Symposium on Computer Architecture (ISCA).

[45] Pradeep Dubey,et al. Fast sort on CPUs and GPUs: a case for bandwidth oblivious SIMD sort , 2010, SIGMOD Conference.

[46] Alexey Kukanov,et al. The Foundations for Scalable Multicore Software in Intel Threading Building Blocks , 2007 .

[47] Pradeep Dubey,et al. FAST: fast architecture sensitive tree search on modern CPUs and GPUs , 2010, SIGMOD Conference.

[48] M. Berger,et al. Adaptive mesh refinement for hyperbolic partial differential equations , 1982 .

[49] Onur Mutlu,et al. Parallelism-Aware Batch Scheduling: Enhancing both Performance and Fairness of Shared DRAM Systems , 2008, 2008 International Symposium on Computer Architecture.

[50] Magdalena Balazinska,et al. Analyzing massive astrophysical datasets: Can Pig/Hadoop or a relational DBMS help? , 2009, 2009 IEEE International Conference on Cluster Computing and Workshops.

[51] Jingren Zhou,et al. SCOPE: easy and efficient parallel processing of massive data sets , 2008, Proc. VLDB Endow..

[52] Dinesh Manocha,et al. GPUTeraSort: high performance graphics co-processor sorting for large database management , 2006, SIGMOD Conference.

[53] Volker Markl,et al. A First Step Towards GPU-assisted Query Optimization , 2012, ADMS@VLDB.

[54] Volker Markl,et al. Hardware-Oblivious Parallelism for In-Memory Column-Stores , 2013, Proc. VLDB Endow..

[55] Martin L. Kersten,et al. Waste not… Efficient co-processing of relational data , 2014, 2014 IEEE 30th International Conference on Data Engineering.

[56] Shinpei Kato,et al. Gdev: First-Class GPU Resource Management in the Operating System , 2012, USENIX Annual Technical Conference.

[57] Erik Lindholm,et al. NVIDIA Tesla: A Unified Graphics and Computing Architecture , 2008, IEEE Micro.

[58] Kenneth A. Ross,et al. Ameliorating memory contention of OLAP operators on GPU processors , 2012, DaMoN '12.

[59] A. Snavely,et al. Symbiotic jobscheduling for a simultaneous mutlithreading processor , 2000, SIGP.

[60] Divyakant Agrawal,et al. Hardware acceleration for spatial selections and joins , 2003, SIGMOD '03.

[61] Mark Silberstein,et al. PTask: operating system abstractions to manage GPUs as compute devices , 2011, SOSP.

[62] Martin L. Kersten,et al. Accelerating Foreign-Key Joins using Asymmetric Memory Channels , 2011, ADMS@VLDB.

[63] David I. August,et al. Automatic CPU-GPU communication management and optimization , 2011, PLDI '11.

[64] Joel H. Saltz,et al. Accelerating Pathology Image Data Cross-Comparison on CPU-GPU Hybrid Systems , 2012, Proc. VLDB Endow..

[65] Yao Zhang,et al. A quantitative performance analysis model for GPU architectures , 2011, 2011 IEEE 17th International Symposium on High Performance Computer Architecture.