A Hardware-Efficient Recognition Accelerator Using Haar-Like Feature and SVM Classifier

Significantly improved performance of the various learning algorithms has revived the interest in computer vision for recognition applications during the current decade. This paper reports a vision-based hardware recognition architecture combining the Haar-like feature extraction with the support vector machine (SVM) classification. To support an optimal tradeoff between resource requirements, processing speed, and recognition accuracy, a 12-bit fixed-point computation for block-based feature normalization and a recycling allocation of minimalized memory resources are proposed in this paper. Furthermore, an efficient scale generation of target objects for recognition is enabled by configurable windows with high size flexibility. Additionally, a parallel-partial SVM-classification architecture is developed for improving the recognition speed, by accumulating the partially completed SVM results for multiple windows in parallel. The proposed hardware architecture is verified with an Altera DE4 platform to achieve a high throughput rate of 216 and 70 f/s for XGA (<inline-formula> <tex-math notation="LaTeX">$1024\times 768$ </tex-math></inline-formula>) and HD (<inline-formula> <tex-math notation="LaTeX">$1920\times 1080$ </tex-math></inline-formula>) video resolutions, respectively. A recycled memory space of only 193 KB is sufficient for processing high-resolution images up to <inline-formula> <tex-math notation="LaTeX">$2048\times 2048$ </tex-math></inline-formula> pixels during online testing. Using the INRIA person dataset, 89.81% average precision and maximum accuracy of 96.93% for pedestrian recognition are realized. Furthermore, about 99.08% accuracy is achieved for two car recognition tasks using the UIUC dataset (side view of cars) and a frontal car dataset collected by ourselves at Hiroshima University with the proposed hardware-architecture framework.

[1]  Ulrich Brunsmann,et al.  FPGA-Based Real-Time Pedestrian Detection on High-Resolution Images , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[2]  Sheng Liu,et al.  Towards Clinical Diagnosis: Automated Stroke Lesion Segmentation on Multi-Spectral MR Image Using Convolutional Neural Network , 2018, IEEE Access.

[3]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[4]  Dan Roth,et al.  Learning to detect objects in images via a sparse, part-based representation , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Pietro Perona,et al.  Pedestrian Detection: An Evaluation of the State of the Art , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Ausif Mahmood,et al.  Convolutional Recurrent Deep Learning Model for Sentence Classification , 2018, IEEE Access.

[7]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[8]  Jan Peters,et al.  Model learning for robot control: a survey , 2011, Cognitive Processing.

[9]  Guillermo Sapiro,et al.  Sparse Representation for Computer Vision and Pattern Recognition , 2010, Proceedings of the IEEE.

[10]  Wei Hu,et al.  AdaBoost-Based Algorithm for Network Intrusion Detection , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[11]  Franz Josef Och,et al.  Minimum Error Rate Training in Statistical Machine Translation , 2003, ACL.

[12]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[13]  Xu Yuan,et al.  A two-stage hog feature extraction processor embedded with SVM for pedestrian detection , 2015, 2015 IEEE International Conference on Image Processing (ICIP).

[14]  Yuxing Tang,et al.  Visual and Semantic Knowledge Transfer for Large Scale Semi-Supervised Object Detection , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Fengwei An,et al.  Resource-Efficient Object-Recognition Coprocessor With Parallel Processing of Multiple Scan Windows in 65-nm CMOS , 2018, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[16]  Paul A. Viola,et al.  Robust Real-Time Face Detection , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[17]  Luc Van Gool,et al.  Speeded-Up Robust Features (SURF) , 2008, Comput. Vis. Image Underst..

[18]  Viktor K. Prasanna,et al.  A Framework for Generating High Throughput CNN Implementations on FPGAs , 2018, FPGA.

[19]  Sergio Guadarrama,et al.  Speed/Accuracy Trade-Offs for Modern Convolutional Object Detectors , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Joseph R. Cavallaro,et al.  A fast and efficient sift detector using the mobile GPU , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[21]  Vandana,et al.  Survey of Nearest Neighbor Techniques , 2010, ArXiv.

[22]  Majid Mirmehdi,et al.  Real-Time Detection and Recognition of Road Traffic Signs , 2012, IEEE Transactions on Intelligent Transportation Systems.

[23]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[24]  Mohan M. Trivedi,et al.  A General Active-Learning Framework for On-Road Vehicle Recognition and Tracking , 2010, IEEE Transactions on Intelligent Transportation Systems.

[25]  Jing Li,et al.  Improving the Performance of OpenCL-based FPGA Accelerator for Convolutional Neural Network , 2017, FPGA.

[26]  James Philbin,et al.  FaceNet: A unified embedding for face recognition and clustering , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Jaime Lloret,et al.  Network Traffic Classifier With Convolutional and Recurrent Neural Networks for Internet of Things , 2017, IEEE Access.

[28]  Hermann Winner,et al.  Three Decades of Driver Assistance Systems: Review and Future Perspectives , 2014, IEEE Intelligent Transportation Systems Magazine.

[29]  Shintaro Izumi,et al.  Architectural Study of HOG Feature Extraction Processor for Real-Time Object Detection , 2012, 2012 IEEE Workshop on Signal Processing Systems.

[30]  Leibo Liu,et al.  An AdaBoost-Based Face Detection System Using Parallel Configurable Architecture With Optimized Computation , 2017, IEEE Systems Journal.

[31]  G. W. Hughes,et al.  Minimum Prediction Residual Principle Applied to Speech Recognition , 1975 .

[32]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[33]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[34]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Michael Grüninger,et al.  Introduction , 2002, CACM.

[36]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[37]  Holger Blume,et al.  A HOG-based Real-time and Multi-scale Pedestrian Detector Demonstration System on FPGA , 2018, FPGA.

[38]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[39]  Ali Farhadi,et al.  YOLO9000: Better, Faster, Stronger , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  Alex Pentland,et al.  A Bayesian Computer Vision System for Modeling Human Interactions , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[41]  Yuxing Tang,et al.  Fusing generic objectness and deformable part-based models for weakly supervised object detection , 2014, 2014 IEEE International Conference on Image Processing (ICIP).

[42]  Wenyuan Lu,et al.  Laius: An 8-Bit Fixed-Point CNN Hardware Inference Engine , 2017, 2017 IEEE International Symposium on Parallel and Distributed Processing with Applications and 2017 IEEE International Conference on Ubiquitous Computing and Communications (ISPA/IUCC).

[43]  Liang-Gee Chen,et al.  A 52 mW Full HD 160-Degree Object Viewpoint Recognition SoC With Visual Vocabulary Processor for Wearable Vision Applications , 2012, IEEE Journal of Solid-State Circuits.

[44]  Fengwei An,et al.  Low-power coprocessor for Haar-like feature extraction with pixel-based pipelined architecture , 2017 .