Efficient Algorithms for the Summed Area Tables Primitive on GPUs
暂无分享,去创建一个
Satoshi Matsuoka | Mohamed Wahib | Ryousei Takano | Peng Chen | Shin'ichiro Takizawa | S. Matsuoka | Ryousei Takano | M. Wahib | Peng Chen | Shin'ichiro Takizawa
[1] Robert Laganière,et al. Fast LBP Face Detection on Low-Power SIMD Architectures , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.
[2] Libor Preucil,et al. FPGA based Speeded Up Robust Features , 2009, 2009 IEEE International Conference on Technologies for Practical Robot Applications.
[3] John D. Owens,et al. Register packing for cyclic reduction: a case study , 2011, GPGPU-4.
[4] Guna Seetharaman,et al. Efficient GPU Implementation of the Integral Histogram , 2012, ACCV Workshops.
[5] Derek Bradley,et al. Adaptive Thresholding using the Integral Image , 2007, J. Graph. Tools.
[6] Tack-Don Han,et al. A Scalable Work-Efficient and Depth-Optimal Parallel Scan for the GPGPU Environment , 2013, IEEE Transactions on Parallel and Distributed Systems.
[7] Martin Burtscher,et al. Higher-order and tuple-based massively-parallel prefix sums , 2016, PLDI.
[8] Guy E. Blelloch,et al. Prefix sums and their applications , 1990 .
[9] Xavier Martorell,et al. Real-time GPU-based face detection in HD video sequences , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).
[10] Harold S. Stone,et al. A Parallel Algorithm for the Efficient Solution of a General Class of Recurrence Equations , 1973, IEEE Transactions on Computers.
[11] G. Bradski,et al. 詳解OpenCV : コンピュータビジョンライブラリを使った画像処理・認識 , 2009 .
[12] Desanka Polajnar,et al. Local binary pattern network: A deep learning approach for face recognition , 2016, 2016 IEEE International Conference on Image Processing (ICIP).
[13] Jiří Machač. Intel Integrated Performance Primitives a jejich využití při vývoji aplikací , 2008 .
[14] Youngbae Hwang,et al. Memory-efficient SURF architecture for ASIC implementation , 2014 .
[15] Jean-Pierre Dérutin,et al. SIMD, SMP and MIMD-DM parallel approaches for real-time 2D image stabilization , 2005, Seventh International Workshop on Computer Architecture for Machine Perception (CAMP'05).
[16] Paul A. Viola,et al. Rapid object detection using a boosted cascade of simple features , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.
[17] H. T. Kung,et al. A Regular Layout for Parallel Adders , 1982, IEEE Transactions on Computers.
[18] Yu Wei,et al. FPGA implementation of AdaBoost algorithm for detection of face biometrics , 2004, IEEE International Workshop on Biomedical Circuits and Systems, 2004..
[19] Deming Chen,et al. A novel SoC architecture on FPGA for ultra fast face detection , 2009, 2009 IEEE International Conference on Computer Design.
[20] Jaime S. Cardoso,et al. Deep Local Binary Patterns , 2017, ArXiv.
[21] Yongchao Liu,et al. LightScan: Faster Scan Primitive on CUDA Compatible Manycore Processors , 2016, ArXiv.
[22] Margarita Amor,et al. Efficient Scan Operator Methods on a GPU , 2014, 2014 IEEE 26th International Symposium on Computer Architecture and High Performance Computing.
[23] M. Hosseinzadeh,et al. Fast Overflow Detection in Moduli Set {2 n - 1, 2 n , 2 n + 1} , 2011 .
[24] Narayanan Vijaykrishnan,et al. A parallel architecture for hardware face detection , 2006, IEEE Computer Society Annual Symposium on Emerging VLSI Technologies and Architectures (ISVLSI'06).
[25] Hirotaka Tamura,et al. Fast algorithm using summed area tables with unified layer performing convolution and average pooling , 2017, 2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP).
[26] Rodolfo S. Lima,et al. GPU-efficient recursive filtering and summed-area tables , 2011, SA '11.
[27] Ichiro Masaki,et al. Efficient integral image computation on the GPU , 2010, 2010 IEEE Intelligent Vehicles Symposium.
[28] Bohyung Han,et al. Bayesian Filtering and Integral Image for Visual Tracking , 2005 .
[29] Diederik Verkest,et al. Real-time high-definition stereo matching on FPGA , 2011, FPGA '11.
[30] Akihiko Kasagi,et al. Parallel Algorithms for the Summed Area Table on the Asynchronous Hierarchical Memory Machine, with GPU implementations , 2014, 2014 43rd International Conference on Parallel Processing.
[31] Xinxin Mei,et al. Benchmarking the Memory Hierarchy of Modern GPUs , 2014, NPC.
[32] J. P. Lewis,et al. Fast Template Matching , 2009 .
[33] Marco Maggioni,et al. Dissecting the NVIDIA Volta GPU Architecture via Microbenchmarking , 2018, ArXiv.
[34] Klaus D. McDonald-Maier,et al. Integral Images: Efficient Algorithms for Their Computation and Storage in Resource-Constrained Embedded Vision Systems , 2015, Sensors.
[35] Pablo Enfedaque,et al. Implementation of the DWT in a GPU through a Register-based Strategy , 2015, IEEE Transactions on Parallel and Distributed Systems.
[36] Shengen Yan,et al. A fast integral image generation algorithm on GPUs , 2014, 2014 20th IEEE International Conference on Parallel and Distributed Systems (ICPADS).
[37] Jeff Nichols,et al. Announcing Supercomputer Summit , 2016 .
[38] Anselmo Lastra,et al. Fast Summed‐Area Table Generation and its Applications , 2005, Comput. Graph. Forum.
[39] Guna Seetharaman,et al. Fast Integral Histogram Computations on GPU for Real-Time Video Analytics , 2017, ArXiv.
[40] Vinod Nair,et al. An FPGA-Based People Detection System , 2005, EURASIP J. Adv. Signal Process..
[41] Franklin C. Crow,et al. Summed-area tables for texture mapping , 1984, SIGGRAPH.
[42] Shuicheng Yan,et al. An HOG-LBP human detector with partial occlusion handling , 2009, 2009 IEEE 12th International Conference on Computer Vision.
[43] Satoshi Matsuoka. The Road to TSUBAME and Beyond , 2008 .
[44] Satoshi Matsuoka. Being "BYTES-oriented" in HPC leads to an open big data/AI ecosystem and further advances into the post-moore era , 2017, BigData.
[45] Anselmo Lastra,et al. Fast HDR Image-Based Lighting Using Summed-Area Tables , 2006 .
[46] Luc Van Gool,et al. SURF: Speeded Up Robust Features , 2006, ECCV.
[47] André Seznec,et al. Performance upper bound analysis and optimization of SGEMM on Fermi and Kepler GPUs , 2013, Proceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization (CGO).
[48] Andreas Moshovos,et al. Demystifying GPU microarchitecture through microbenchmarking , 2010, 2010 IEEE International Symposium on Performance Analysis of Systems & Software (ISPASS).
[49] Manuel M. Oliveira. Real-Time Photographic Local Tone Reproduction Using Summed-Area Tables , 2008 .
[50] Raul Queiroz Feitosa,et al. Real-Time Object Tracking in High-Definition Video Using Frame Segmentation and Background Integral Images , 2013, 2013 XXVI Conference on Graphics, Patterns and Images.
[51] Christopher H. Messom,et al. Stream Processing of Geometric and Central Moments Using High Precision Summed Area Tables , 2008, ICONIP.
[52] Alexander Toet,et al. Speed-up Template Matching through Integral Image based Weak Classifiers , 2014 .