A software-based dynamic-warp scheduling approach for load-balancing the Viola-Jones face detection algorithm on GPUs

Face detection is a key component in applications such as security surveillance and human-computer interaction systems, and real-time recognition is essential in many scenarios. The Viola-Jones algorithm is an attractive means of meeting the real time requirement, and has been widely implemented on custom hardware, FPGAs and GPUs. We demonstrate a GPU implementation that achieves competitive performance, but with low development costs. Our solution treats the irregularity inherent to the algorithm using a novel dynamic warp scheduling approach that eliminates thread divergence. This new scheme also employs a thread pool mechanism, which significantly alleviates the cost of creating, switching, and terminating threads. Compared to static thread scheduling, our dynamic warp scheduling approach reduces the execution time by a factor of 3. To maximize detection throughput, we also run on multiple GPUs, realizing 95.6 FPS on 5 Fermi GPUs.

[1]  Tomaso A. Poggio,et al.  A general framework for object detection , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[2]  Yangdong Deng,et al.  GPU accelerated face detection , 2010, 2010 International Conference on Intelligent Control and Information Processing.

[3]  Paul A. Viola,et al.  Robust Real-Time Face Detection , 2001, International Journal of Computer Vision.

[4]  Ryan Kastner,et al.  Fpga-based face detection system using Haar classifiers , 2009, FPGA '09.

[5]  Scott B. Baden,et al.  Accelerating Viola-Jones Face Detection to FPGA-Level Using GPUs , 2010, 2010 18th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines.

[6]  Kevin Skadron,et al.  Dynamic warp subdivision for integrated branch and memory divergence tolerance , 2010, ISCA.

[7]  Scott B. Baden,et al.  Mint: realizing CUDA performance in 3D stencil methods with annotated C , 2011, ICS '11.

[8]  Ryan Kastner,et al.  Parallelized Architecture of Multiple Classifiers for Face Detection , 2009, 2009 20th IEEE International Conference on Application-specific Systems, Architectures and Processors.

[9]  Takeo Kanade,et al.  A statistical method for 3D object detection applied to faces and cars , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[10]  Amit A. Kale,et al.  Towards a robust, real-time face processing system using CUDA-enabled GPUs , 2009, 2009 International Conference on High Performance Computing (HiPC).

[11]  Takeo Kanade,et al.  Neural Network-Based Face Detection , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[12]  Yu Wei,et al.  FPGA implementation of AdaBoost algorithm for detection of face biometrics , 2004, IEEE International Workshop on Biomedical Circuits and Systems, 2004..

[13]  Jesse Patrick Harvey,et al.  GPU acceleration of object classification algorithms using NVIDIA CUDA , 2009 .

[14]  Narayanan Vijaykrishnan,et al.  A parallel architecture for hardware face detection , 2006, IEEE Computer Society Annual Symposium on Emerging VLSI Technologies and Architectures (ISVLSI'06).

[15]  Takeo Kanade,et al.  A statistical approach to 3d object detection applied to faces and cars , 2000 .

[16]  Vinod Nair,et al.  An FPGA-Based People Detection System , 2005, EURASIP J. Adv. Signal Process..

[17]  Shih-Lien Lu,et al.  Novel FPGA based Haar classifier face detection algorithm acceleration , 2008, 2008 International Conference on Field Programmable Logic and Applications.

[18]  Paul A. Viola,et al.  Rapid object detection using a boosted cascade of simple features , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.