Achieving Scalability in a k-NN Multi-GPU Network Service with Centaur
暂无分享,去创建一个
Mark Silberstein | Ohad Shacham | Edward Bortnikov | Alexander Libov | Amir Watad | Edward Bortnikov | M. Silberstein | Amir Watad | Alexander Libov | Ohad Shacham | Alex Libov
[1] J. MacQueen. Some methods for classification and analysis of multivariate observations , 1967 .
[2] Keinosuke Fukunaga,et al. A Branch and Bound Algorithm for Computing k-Nearest Neighbors , 1975, IEEE Transactions on Computers.
[3] Jon Louis Bentley,et al. An Algorithm for Finding Best Matches in Logarithmic Expected Time , 1977, TOMS.
[4] Piotr Indyk,et al. Similarity Search in High Dimensions via Hashing , 1999, VLDB.
[5] Rajkumar Buyya,et al. High Performance Cluster Computing , 1999 .
[6] Michel Barlaud,et al. Fast k nearest neighbor search using GPU , 2008, 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.
[7] C. Moallemi,et al. The Cost of Latency ∗ , 2009 .
[8] Liheng Jian,et al. CUKNN: A parallel implementation of K-nearest neighbor on CUDA-enabled GPU , 2009, 2009 IEEE Youth Conference on Information, Computing and Telecommunication.
[9] Lei Zhao,et al. A Practical GPU Based KNN Algorithm , 2009 .
[10] John E. Stone,et al. OpenCL: A Parallel Programming Standard for Heterogeneous Computing Systems , 2010, Computing in Science & Engineering.
[11] Frank Nielsen,et al. K-nearest neighbor search: Fast GPU-based implementations and application to high-dimensional feature matching , 2010, 2010 IEEE International Conference on Image Processing.
[12] Matthijs Douze,et al. Searching in one billion vectors: Re-rank with source coding , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[13] Mark Silberstein,et al. PTask: operating system abstractions to manage GPUs as compute devices , 2011, SOSP.
[14] Cordelia Schmid,et al. Product Quantization for Nearest Neighbor Search , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[15] Sayantan Sur,et al. MVAPICH2-GPU: optimized GPU to GPU communication for InfiniBand clusters , 2011, Computer Science - Research and Development.
[16] Vivek Sarkar,et al. Dynamic Task Parallelism with a GPU Work-Stealing Runtime System , 2011, LCPC.
[17] Matthijs C. Dorst. Distinctive Image Features from Scale-Invariant Keypoints , 2011 .
[18] Seungyeop Han,et al. SSLShader: Cheap SSL Acceleration with Commodity Processors , 2011, NSDI.
[19] Jeff A. Stuart,et al. A study of Persistent Threads style GPU programming for GPGPU workloads , 2012, 2012 Innovative Parallel Computing (InPar).
[20] John D. Owens,et al. A GPU Task-Parallel Model with Dependency Resolution , 2012, Computer.
[21] R. Govindarajan,et al. Improving GPGPU concurrency with elastic kernels , 2013, ASPLOS '13.
[22] Laxmi N. Bhuyan,et al. A dynamic self-scheduling scheme for heterogeneous multiprocessor architectures , 2013, TACO.
[23] Luiz André Barroso,et al. The tail at scale , 2013, CACM.
[24] Idit Keidar,et al. GPUfs: Integrating a file system with GPUs , 2013, TOCS.
[25] Mark Silberstein,et al. GPUnet , 2014, OSDI.
[26] Rashmi Agrawal. K-Nearest Neighbor for Uncertain Data , 2014 .
[27] Jun Pang,et al. Rhythm: harnessing data parallel hardware for server workloads , 2014, ASPLOS.
[28] Matt Welsh. SEDA: An Architecture for Highly Concurrent Server Applications , 2015 .
[29] Mike O'Connor,et al. MemcachedGPU: scaling-up scale-out key-value stores , 2015, SoCC.
[30] Mark Silberstein,et al. GPUrdma: GPU-side library for high performance networking from GPU kernels , 2016, ROSS@HPDC.
[31] Torsten Hoefler,et al. dCUDA: Hardware Supported Overlap of Computation and Communication , 2016, SC16: International Conference for High Performance Computing, Networking, Storage and Analysis.
[32] Martín Abadi,et al. TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.
[33] Mark Silberstein,et al. ActivePointers: A Case for Software Address Translation on GPUs , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).
[34] Rudolf Eigenmann,et al. Pagoda: Fine-Grained GPU Resource Virtualization for Narrow Tasks , 2017, PPOPP.