Algorithmic and Software System Support to Accelerate Data Processing in CPU-GPU Hybrid Computing Environments

.............................................................................................................................. ii Dedication .......................................................................................................................... iv Acknowledgments............................................................................................................... v Vita ..................................................................................................................................... vi Publications ........................................................................................................................ vi Fields of Study .................................................................................................................. vii Table of

[1]  Jie Cheng,et al.  Programming Massively Parallel Processors. A Hands-on Approach , 2010, Scalable Comput. Pract. Exp..

[2]  Xiaoning Ding,et al.  BWS: balanced work stealing for time-sharing multicores , 2012, EuroSys '12.

[3]  Jun Kong,et al.  A data model and database for high-resolution pathology analytical image informatics , 2011, Journal of pathology informatics.

[4]  Vanish Talwar,et al.  Pegasus: Coordinated Scheduling for Virtualized Accelerator-based Systems , 2011, USENIX Annual Technical Conference.

[5]  Pradeep Dubey,et al.  Debunking the 100X GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU , 2010, ISCA.

[6]  Anastasia Ailamaki,et al.  QPipe: a simultaneously pipelined relational query engine , 2005, SIGMOD '05.

[7]  Juan Pineda,et al.  A parallel algorithm for polygon rasterization , 1988, SIGGRAPH.

[8]  Jun Kong,et al.  Integrated morphologic analysis for the identification and characterization of disease subtypes , 2012, J. Am. Medical Informatics Assoc..

[9]  Bingsheng He,et al.  High-Throughput Transaction Executions on Graphics Processors , 2011, Proc. VLDB Endow..

[10]  Daniel T. Larose,et al.  Discovering Knowledge in Data: An Introduction to Data Mining , 2005 .

[11]  Dawson R. Engler,et al.  Exterminate all operating system abstractions , 1995, Proceedings 5th Workshop on Hot Topics in Operating Systems (HotOS-V).

[12]  Sudhakar Yalamanchili,et al.  Kernel Weaver: Automatically Fusing Database Primitives for Efficient GPU Computation , 2012, 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture.

[13]  Aaftab Munshi,et al.  The OpenCL specification , 2009, 2009 IEEE Hot Chips 21 Symposium (HCS).

[14]  Kevin Skadron,et al.  Rodinia: A benchmark suite for heterogeneous computing , 2009, 2009 IEEE International Symposium on Workload Characterization (IISWC).

[15]  Seungyeop Han,et al.  SSLShader: Cheap SSL Acceleration with Commodity Processors , 2011, NSDI.

[16]  Peter Benjamin Volk,et al.  GPU join processing revisited , 2012, DaMoN '12.

[17]  James Demmel,et al.  the Parallel Computing Landscape , 2022 .

[18]  Yuan Yuan,et al.  The Yin and Yang of Processing Data Warehousing Queries on GPU Devices , 2013, Proc. VLDB Endow..

[19]  Hyesoon Kim Supporting virtual memory in GPGPU without supporting precise exceptions , 2012, MSPC '12.

[20]  Bingsheng He,et al.  Relational joins on graphics processors , 2008, SIGMOD Conference.

[21]  Peter J. Denning,et al.  Third Generation Computer Systems , 1971, CSUR.

[22]  John E. Stone,et al.  An asymmetric distributed shared memory model for heterogeneous parallel systems , 2010, ASPLOS XV.

[23]  Feng Ji,et al.  RSVM: A Region-based Software Virtual Memory for GPU , 2013, Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques.

[24]  Divyakant Agrawal,et al.  Hardware Acceleration in Commercial Databases: A Case Study of Spatial Operations , 2004, VLDB.

[25]  Christos Faloutsos,et al.  Hilbert R-tree: An Improved R-tree using Fractals , 1994, VLDB.

[26]  Mark de Berg,et al.  Computational geometry: algorithms and applications , 1997 .

[27]  Peter J. Denning,et al.  Virtual memory , 1970, CSUR.

[28]  Dinesh Manocha,et al.  Fast computation of database operations using graphics processors , 2005, SIGGRAPH Courses.

[29]  Sebastian Breß,et al.  Why it is time for a HyPE: A Hybrid Query Processing Engine for Efficient GPU Coprocessing in DBMS , 2013, Proc. VLDB Endow..

[30]  Mohan S. Kankanhalli,et al.  Calculating the Area of Overlaid Polygons Without Constructing the Overlay , 1994 .

[31]  Ronald L. Wasserstein,et al.  Monte Carlo: Concepts, Algorithms, and Applications , 1997 .

[32]  John Poulton An embedded DRAM for CMOS ASICs , 1997, Proceedings Seventeenth Conference on Advanced Research in VLSI.

[33]  Joseph O'Rourke,et al.  Computational Geometry in C. , 1995 .

[34]  Idit Keidar,et al.  GPUfs: Integrating a file system with GPUs , 2013, TOCS.

[35]  Shinpei Kato,et al.  TimeGraph: GPU Scheduling for Real-Time Multi-Tasking Environments , 2011, USENIX Annual Technical Conference.

[36]  Martin L. Kersten,et al.  The researcher's guide to the data deluge , 2011, Proc. VLDB Endow..

[37]  David J. DeWitt,et al.  Building a scaleable geo-spatial DBMS: technology, implementation, and evaluation , 1997, SIGMOD '97.

[38]  Gang Wang,et al.  Efficient Parallel Lists Intersection and Index Compression Algorithms using Graphics Processing Units , 2011, Proc. VLDB Endow..

[39]  Lei Jiang,et al.  Die Stacking (3D) Microarchitecture , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).

[40]  Michael R. Macedonia,et al.  The GPU Enters Computing's Mainstream , 2003, Computer.

[41]  Shinpei Kato,et al.  GDM: device memory management for gpgpu computing , 2014, SIGMETRICS '14.

[42]  Fusheng Wang,et al.  YSmart: Yet Another SQL-to-MapReduce Translator , 2011, 2011 31st International Conference on Distributed Computing Systems.

[43]  Wen-mei W. Hwu,et al.  Optimization principles and application performance evaluation of a multithreaded GPU using CUDA , 2008, PPoPP.

[44]  Karthikeyan Sankaralingam,et al.  iGPU: Exception support and speculative execution on GPUs , 2012, 2012 39th Annual International Symposium on Computer Architecture (ISCA).

[45]  Pradeep Dubey,et al.  Fast sort on CPUs and GPUs: a case for bandwidth oblivious SIMD sort , 2010, SIGMOD Conference.

[46]  Alexey Kukanov,et al.  The Foundations for Scalable Multicore Software in Intel Threading Building Blocks , 2007 .

[47]  Pradeep Dubey,et al.  FAST: fast architecture sensitive tree search on modern CPUs and GPUs , 2010, SIGMOD Conference.

[48]  M. Berger,et al.  Adaptive mesh refinement for hyperbolic partial differential equations , 1982 .

[49]  Onur Mutlu,et al.  Parallelism-Aware Batch Scheduling: Enhancing both Performance and Fairness of Shared DRAM Systems , 2008, 2008 International Symposium on Computer Architecture.

[50]  Magdalena Balazinska,et al.  Analyzing massive astrophysical datasets: Can Pig/Hadoop or a relational DBMS help? , 2009, 2009 IEEE International Conference on Cluster Computing and Workshops.

[51]  Jingren Zhou,et al.  SCOPE: easy and efficient parallel processing of massive data sets , 2008, Proc. VLDB Endow..

[52]  Dinesh Manocha,et al.  GPUTeraSort: high performance graphics co-processor sorting for large database management , 2006, SIGMOD Conference.

[53]  Volker Markl,et al.  A First Step Towards GPU-assisted Query Optimization , 2012, ADMS@VLDB.

[54]  Volker Markl,et al.  Hardware-Oblivious Parallelism for In-Memory Column-Stores , 2013, Proc. VLDB Endow..

[55]  Martin L. Kersten,et al.  Waste not… Efficient co-processing of relational data , 2014, 2014 IEEE 30th International Conference on Data Engineering.

[56]  Shinpei Kato,et al.  Gdev: First-Class GPU Resource Management in the Operating System , 2012, USENIX Annual Technical Conference.

[57]  Erik Lindholm,et al.  NVIDIA Tesla: A Unified Graphics and Computing Architecture , 2008, IEEE Micro.

[58]  Kenneth A. Ross,et al.  Ameliorating memory contention of OLAP operators on GPU processors , 2012, DaMoN '12.

[59]  A. Snavely,et al.  Symbiotic jobscheduling for a simultaneous mutlithreading processor , 2000, SIGP.

[60]  Divyakant Agrawal,et al.  Hardware acceleration for spatial selections and joins , 2003, SIGMOD '03.

[61]  Mark Silberstein,et al.  PTask: operating system abstractions to manage GPUs as compute devices , 2011, SOSP.

[62]  Martin L. Kersten,et al.  Accelerating Foreign-Key Joins using Asymmetric Memory Channels , 2011, ADMS@VLDB.

[63]  David I. August,et al.  Automatic CPU-GPU communication management and optimization , 2011, PLDI '11.

[64]  Joel H. Saltz,et al.  Accelerating Pathology Image Data Cross-Comparison on CPU-GPU Hybrid Systems , 2012, Proc. VLDB Endow..

[65]  Yao Zhang,et al.  A quantitative performance analysis model for GPU architectures , 2011, 2011 IEEE 17th International Symposium on High Performance Computer Architecture.