Real-time video scene analysis with heterogeneous processors

Field-Programmable Gate Arrays (FPGAs) and General Purpose Graphics Processing Units (GPUs) allow acceleration and real-time processing of computationally intensive computer vision algorithms. The decision to use either architecture in any application is determined by task-specific priorities such as processing latency, power consumption and algorithm accuracy. This choice is normally made at design time on a heuristic or fixed algorithmic basis; here we propose an alternative method for automatic runtime selection. In this thesis, we describe our PC-based system architecture containing both platforms; this provides greater flexibility and allows dynamic selection of processing platforms to suit changing scene priorities. Using the Histograms of Oriented Gradients (HOG) algorithm for pedestrian detection, we comprehensively explore algorithm implementation on FPGA, GPU and a combination of both, and show that the effect of data transfer time on overall processing performance is significant. We also characterise performance of each implementation and quantify tradeoffs between power, time and accuracy when moving processing between architectures, then specify the optimal architecture to use when prioritising each of these. We apply this new knowledge to a real-time surveillance application representative of anomaly detection problems: detecting parked vehicles in videos. Using motion detection and car and pedestrian HOG detectors implemented across multiple architectures to generate detections, we use trajectory clustering and a Bayesian contextual motion algorithm to generate an overall scene anomaly level. This is in turn used to select the architectures to run the compute-intensive detectors for the next frame on, with higher anomalies selecting faster, higher-power implementations. Comparing dynamic context-driven prioritisation of system performance against a fixed mapping of algorithms to architectures shows that our dynamic mapping method is 10% more accurate at detecting events than the power-optimised version, at the cost of 12W higher power consumption.

[1]  Fiona M Donald The classification of vigilance tasks in the real world , 2008, Ergonomics.

[2]  Ryusuke Miyamoto,et al.  A Specialized Processor Suitable for AdaBoost-Based Detection with Haar-like Features , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Wayne Luk,et al.  Have GPUs made FPGAs redundant in the field of video processing? , 2005, Proceedings. 2005 IEEE International Conference on Field-Programmable Technology, 2005..

[4]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[6]  Josef Goette,et al.  Comparing Signal Processing Hardware-Synthesis Methods Based on the Matlab Tool-Chain , 2011, 2011 Sixth IEEE International Symposium on Electronic Design, Test and Application.

[7]  Hiroyuki Ochi,et al.  Hardware Architecture for HOG Feature Extraction , 2009, 2009 Fifth International Conference on Intelligent Information Hiding and Multimedia Signal Processing.

[8]  Shaogang Gong,et al.  Detecting and discriminating behavioural anomalies , 2011, Pattern Recognit..

[9]  Pietro Perona,et al.  Integral Channel Features , 2009, BMVC.

[10]  Oscar Almer,et al.  Selecting the optimal system: automated design of application-specific systems-on-chip , 2011, NoCArc '11.

[11]  Joel S. Warm,et al.  Vigilance Requires Hard Mental Work and Is Stressful , 2008, Hum. Factors.

[12]  Mohan M. Trivedi,et al.  Trajectory Learning for Activity Understanding: Unsupervised, Multilevel, and Long-Term Adaptive Approach , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Stephen A. Edwards,et al.  The challenges of hardware synthesis from C-like languages , 2005, Design, Automation and Test in Europe.

[14]  John Parsons,et al.  Discriminative imaging using a LWIR polarimeter , 2008, Security + Defence.

[15]  Calum G. Blair,et al.  Event-driven dynamic platform selection for power-aware real-time anomaly detection in video , 2014, 2014 International Conference on Computer Vision Theory and Applications (VISAPP).

[16]  Paul A. Viola,et al.  Robust Real-time Object Detection , 2001 .

[17]  G. Amdhal,et al.  Validity of the single processor approach to achieving large scale computing capabilities , 1967, AFIPS '67 (Spring).

[18]  Dah-Jye Lee,et al.  Real-Time Optical Flow Calculations on FPGA and GPU Architectures: A Comparison Study , 2008, 2008 16th International Symposium on Field-Programmable Custom Computing Machines.

[19]  Jason Cong,et al.  FCUDA: Enabling efficient compilation of CUDA kernels onto FPGAs , 2009, 2009 IEEE 7th Symposium on Application Specific Processors.

[20]  N. Mackworth The Breakdown of Vigilance during Prolonged Visual Search 1 , 1948 .

[21]  Greg Welch,et al.  Welch & Bishop , An Introduction to the Kalman Filter 2 1 The Discrete Kalman Filter In 1960 , 1994 .

[22]  Nasser Kehtarnavaz,et al.  Real-time implementation of robust face detection on mobile platforms , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[23]  Nigel P. Topham,et al.  Resource Sharing in Custom Instruction Set Extensions , 2008, 2008 Symposium on Application Specific Processors.

[24]  Fredrik Gustafsson,et al.  A graphics processing unit implementation of the particle filter , 2007, 2007 15th European Signal Processing Conference.

[25]  Koen Bertels,et al.  The Instruction-Set Extension Problem: A Survey , 2008, ARC.

[26]  Osama Masoud,et al.  Learning Traffic Patterns at Intersections by Spectral Clustering of Motion Trajectories , 2006, 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[27]  Paul J. Layzell,et al.  Explorations in design space: unconventional electronics design through artificial evolution , 1999, IEEE Trans. Evol. Comput..

[28]  Michael J. Flynn,et al.  Some Computer Organizations and Their Effectiveness , 1972, IEEE Transactions on Computers.

[29]  Laurent Gantel,et al.  Multiprocessor Task Migration Implementation in a Reconfigurable Platform , 2009, 2009 International Conference on Reconfigurable Computing and FPGAs.

[30]  Christos-Savvas Bouganis,et al.  An FPGA-based object detector with dynamic workload balancing , 2011, 2011 International Conference on Field-Programmable Technology.

[31]  Benjamin Thomas Cope,et al.  Video Processing Acceleration using Reconfigurable Logic and Graphics Processors , 2008 .

[32]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[33]  Kari Pulli,et al.  Realtime Computer Vision with OpenCV , 2012, ACM Queue.

[34]  Luc Van Gool,et al.  Seeking the Strongest Rigid Detector , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[35]  John D. Owens,et al.  GPU Computing , 2008, Proceedings of the IEEE.

[36]  Ulrike Schneider A tabu search tutorial based on a real-world scheduling problem , 2011, Central Eur. J. Oper. Res..

[37]  Wayne Luk,et al.  Performance Comparison of Graphics Processors to Reconfigurable Logic: A Case Study , 2010, IEEE Transactions on Computers.

[38]  Ulrich Brunsmann,et al.  FPGA-GPU architecture for kernel SVM pedestrian detection , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops.

[39]  Navneet Dalal,et al.  Finding People in Images and Videos , 2006 .

[40]  Donald G. Bailey,et al.  Algorithm Transformation for FPGA Implementation , 2010, 2010 Fifth IEEE International Symposium on Electronic Design, Test & Applications.

[41]  Naim Dahnoun,et al.  A novel lane feature extraction algorithm based on digital interpolation , 2009, 2009 17th European Signal Processing Conference.

[42]  Ian Reid,et al.  fastHOG – a real-time GPU implementation of HOG , 2011 .

[43]  James Demmel,et al.  Benchmarking GPUs to tune dense linear algebra , 2008, HiPC 2008.

[44]  Wayne Luk,et al.  A comparison of CPUs, GPUs, FPGAs, and massively parallel processor arrays for random number generation , 2009, FPGA '09.

[45]  Marek Gorgon,et al.  Floating point HOG implementation for real-time multiple object detection , 2012, 22nd International Conference on Field Programmable Logic and Applications (FPL).

[46]  Nigel P. Topham,et al.  Design-Space Exploration of Resource-Sharing Solutions for Custom Instruction Set Extensions , 2009, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[47]  Neil M. Robertson,et al.  Contextual smoothing of image segmentation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops.

[48]  Mohan M. Trivedi,et al.  Learning, Modeling, and Classification of Vehicle Track Patterns from Live Video , 2008, IEEE Transactions on Intelligent Transportation Systems.

[49]  Scott B. Baden,et al.  Accelerating Viola-Jones Face Detection to FPGA-Level Using GPUs , 2010, 2010 18th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines.

[50]  Luc Van Gool,et al.  Traffic sign recognition — How far are we from the solution? , 2013, The 2013 International Joint Conference on Neural Networks (IJCNN).

[51]  Neil Robertson,et al.  Characterising pedestrian detection on a heterogeneous platform , 2012 .

[52]  B. Schölkopf,et al.  Efficient face detection by a cascaded support–vector machine expansion , 2004, Proceedings of the Royal Society of London. Series A: Mathematical, Physical and Engineering Sciences.

[53]  Antonio Albiol,et al.  Detection of Parked Vehicles Using Spatiotemporal Maps , 2011, IEEE Transactions on Intelligent Transportation Systems.

[54]  Jens H. Krüger,et al.  A Survey of General‐Purpose Computation on Graphics Hardware , 2007, Eurographics.

[55]  David F. Bacon,et al.  FPGA Programming for the Masses , 2013, ACM Queue.

[56]  Kalyanmoy Deb,et al.  Multi-objective Genetic Algorithms: Problem Difficulties and Construction of Test Problems , 1999, Evolutionary Computation.

[57]  Karl S. Hemmert,et al.  From Silicon to Science: The Long Road to Production Reconfigurable Supercomputing , 2008, TRETS.

[58]  Jyrki Leskela,et al.  OpenCL embedded profile prototype in mobile device , 2009, 2009 IEEE Workshop on Signal Processing Systems.

[59]  Pavel Zemcík,et al.  "Local Rank Differences" Image Feature Implemented on GPU , 2008, ACIVS.

[60]  Christos-Savvas Bouganis,et al.  GPU Versus FPGA for High Productivity Computing , 2010, 2010 International Conference on Field Programmable Logic and Applications.

[61]  Jie Cheng,et al.  Programming Massively Parallel Processors. A Hands-on Approach , 2010, Scalable Comput. Pract. Exp..

[62]  Jeff Mason,et al.  Invited Paper: Enhanced Architectures, Design Methodologies and CAD Tools for Dynamic Reconfiguration of Xilinx FPGAs , 2006, 2006 International Conference on Field Programmable Logic and Applications.

[63]  John Platt,et al.  Probabilistic Outputs for Support vector Machines and Comparisons to Regularized Likelihood Methods , 1999 .

[64]  Ryusuke Miyamoto,et al.  Hardware architecture for high-accuracy real-time pedestrian detection with CoHOG features , 2009, 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops.

[65]  Regis Hoffman,et al.  Visual classification of coarse vehicle orientation using Histogram of Oriented Gradients features , 2010, 2010 IEEE Intelligent Vehicles Symposium.

[66]  Heather Marie Quinn,et al.  Runtime Tools for Hardware/Software Systems with Reconfigurable Hardware , 2004 .

[67]  W. Eric L. Grimson,et al.  Adaptive background mixture models for real-time tracking , 1999, Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149).

[68]  Tsutomu Maruyama,et al.  Performance comparison of FPGA, GPU and CPU in image processing , 2009, 2009 International Conference on Field Programmable Logic and Applications.

[69]  James R. Larus,et al.  Software and the Concurrency Revolution , 2005, ACM Queue.

[70]  Christopher Dyken,et al.  State-of-the-art in heterogeneous computing , 2010, Sci. Program..

[71]  Amit A. Kale,et al.  Towards a robust, real-time face processing system using CUDA-enabled GPUs , 2009, 2009 International Conference on High Performance Computing (HiPC).

[72]  Marco Platzner,et al.  A self-adaptive heterogeneous multi-core architecture for embedded real-time video object tracking , 2011, Journal of Real-Time Image Processing.

[73]  Christos-Savvas Bouganis,et al.  Synthesis and Optimization of 2D Filter Designs for Heterogeneous FPGAs , 2009, TRETS.

[74]  Libor Preucil,et al.  FPGA based Speeded Up Robust Features , 2009, 2009 IEEE International Conference on Technologies for Practical Robot Applications.

[75]  Ulrich Brunsmann,et al.  FPGA-Based Real-Time Pedestrian Detection on High-Resolution Images , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[76]  Joseph R. Cavallaro,et al.  Workload analysis and efficient OpenCL-based implementation of SIFT algorithm on a smartphone , 2013, 2013 IEEE Global Conference on Signal and Information Processing.

[77]  Bernhard Schölkopf,et al.  Improving the Accuracy and Speed of Support Vector Machines , 1996, NIPS.

[78]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[79]  Pietro Perona,et al.  Pedestrian Detection: An Evaluation of the State of the Art , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[80]  Alessandro Forin,et al.  Direct GPU/FPGA communication Via PCI express , 2012, 2012 41st International Conference on Parallel Processing Workshops.

[81]  Martin D. Levine,et al.  Online Dominant and Anomalous Behavior Detection in Videos , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[82]  Shih-Lien Lu,et al.  A Desktop Computer with a Reconfigurable Pentium® , 2008, TRETS.

[83]  Stephen A. Edwards,et al.  The Challenges of Synthesizing Hardware from C-Like Languages , 2006, IEEE Design & Test of Computers.

[84]  Fabrice Lemonnier,et al.  Definition and SIMD Implementation of a Multi-Processing Architecture Approach on FPGA , 2008, 2008 Design, Automation and Test in Europe.

[85]  P. J. Narayanan,et al.  CUDA cuts: Fast graph cuts on the GPU , 2008, 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[86]  D. Lingaiah Kalman filtering: Theory and practice using MATLAB, 2nd ed [Book Review] , 2003, IEEE Circuits and Devices Magazine.

[87]  Mohan M. Trivedi,et al.  Looking at Vehicles on the Road: A Survey of Vision-Based Vehicle Detection, Tracking, and Behavior Analysis , 2013, IEEE Transactions on Intelligent Transportation Systems.

[88]  Guang Deng,et al.  Real-Time Vision-Based Stop Sign Detection System on FPGA , 2008, 2008 Digital Image Computing: Techniques and Applications.

[89]  Alvin F. Martin,et al.  The DET curve in assessment of detection task performance , 1997, EUROSPEECH.

[90]  Mohan M. Trivedi,et al.  A Survey of Vision-Based Trajectory Learning and Analysis for Surveillance , 2008, IEEE Transactions on Circuits and Systems for Video Technology.

[91]  Srihari Cadambi,et al.  A Massively Parallel FPGA-Based Coprocessor for Support Vector Machines , 2009, 2009 17th IEEE Symposium on Field Programmable Custom Computing Machines.

[92]  Jake K. Aggarwal,et al.  Real-time detection of illegally parked vehicles using 1-D transformation , 2007, 2007 IEEE Conference on Advanced Video and Signal Based Surveillance.

[93]  Pietro Perona,et al.  The Fastest Pedestrian Detector in the West , 2010, BMVC.

[94]  FPGA Run-Time Reconfiguration : Two Approaches , 1998 .

[95]  Bahman Zafarifar,et al.  Real-time FPGA-implementation for blue-sky Detection , 2007, 2007 IEEE International Conf. on Application-specific Systems, Architectures and Processors (ASAP).

[96]  C. T. Johnston Implementing Image Processing Algorithms on FPGAs , 2005 .

[97]  Robert H. Storer,et al.  Datapath synthesis using a problem-space genetic algorithm , 1995, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[98]  Mariano Fons,et al.  Real-time embedded systems powered by FPGA dynamic partial self-reconfiguration: a case study oriented to biometric recognition applications , 2010, Journal of Real-Time Image Processing.

[99]  Alessandro Bevilacqua,et al.  Real time detection of stopped vehicles in traffic scenes , 2007, 2007 IEEE Conference on Advanced Video and Signal Based Surveillance.

[100]  Vittorio Murino,et al.  FPGA-based pedestrian detection using array of covariance features , 2011, 2011 Fifth ACM/IEEE International Conference on Distributed Smart Cameras.

[101]  Matthew Avery Autonomous emergency braking - the next seat belt? , 2014 .

[102]  Cristian Grozea,et al.  FPGA vs. Multi-core CPUs vs. GPUs: Hands-On Experience with a Sorting Application , 2010, Facing the Multicore-Challenge.

[103]  Peter H. N. de With,et al.  Blue Sky Detection for Picture Quality Enhancement , 2006, ACIVS.

[104]  Wei Hu,et al.  A Coarse-to-Fine Strategy for Vehicle Motion Trajectory Clustering , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[105]  Christos-Savvas Bouganis,et al.  Performance comparison of GPU and FPGA architectures for the SVM training problem , 2009, 2009 International Conference on Field-Programmable Technology.

[106]  Sridhar Lakshmanan,et al.  A deformable-template approach to lane detection , 1995, Proceedings of the Intelligent Vehicles '95. Symposium.

[107]  S. Bauer,et al.  FPGA Implementation of a HOG-based Pedestrian Recognition System , 2010 .

[108]  Zoran Zivkovic,et al.  Improved adaptive Gaussian mixture model for background subtraction , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[109]  Jaesang Lee,et al.  Road Following in an Unstructured Desert Environment Based on the EM(Expectation-Maximization) Algorithm , 2006, 2006 SICE-ICASE International Joint Conference.

[110]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[111]  Neil M Robertson,et al.  Characterising a Heterogeneous System for Person Detection in Video using Histograms of Oriented Gradients : Power vs . Speed vs . Accuracy , 2013 .

[112]  Peter Marwedel,et al.  Hardware/software partitioning using integer programming , 1996, Proceedings ED&TC European Design and Test Conference.

[113]  Takashi Machida,et al.  GPU & CPU cooperative accelerated pedestrian and vehicle detection , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[114]  Narayanan Vijaykrishnan,et al.  A Hardware Efficient Support Vector Machine Architecture for FPGA , 2008, 2008 16th International Symposium on Field-Programmable Custom Computing Machines.

[115]  Tom Wilson,et al.  Pedestrian detection implemented on a fixed-point parallel architecture , 2009, 2009 IEEE 13th International Symposium on Consumer Electronics.

[116]  Alex Fit-Florea,et al.  Precision and Performance: Floating Point and IEEE 754 Compliance for NVIDIA GPUs , 2011 .

[117]  Krzysztof Kuchcinski Constraints-driven design space exploration for distributed embedded systems , 2001, J. Syst. Archit..

[118]  Gian Luca Foresti,et al.  On-line trajectory clustering for anomalous events detection , 2006, Pattern Recognit. Lett..

[119]  Tarak Gandhi,et al.  Multi-camera Based Traffic Flow Characterization & Classification , 2007, 2007 IEEE Intelligent Transportation Systems Conference.

[120]  George Eastman House,et al.  Sparse Bayesian Learning and the Relevance Vector Machine , 2001 .

[121]  Stephen Neuendorffer,et al.  Accelerating OpenCV Applications with Zynq-7000 All Programmable SoC using Vivado HLS Video Libraries , 2013 .

[122]  Kevin Skadron,et al.  Accelerating leukocyte tracking using CUDA: A case study in leveraging manycore coprocessors , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.

[123]  Bertram Bussell,et al.  Parallel Processing in a Restructurable Computer System , 1963, IEEE Trans. Electron. Comput..

[124]  C. Tucker Red and photographic infrared linear combinations for monitoring vegetation , 1979 .

[125]  Hoai Bac Le,et al.  GPU Implementation of Extended Gaussian Mixture Model for Background Subtraction , 2010, 2010 IEEE RIVF International Conference on Computing & Communication Technologies, Research, Innovation, and Vision for the Future (RIVF).

[126]  Heather M. Quinn,et al.  Dynamo: a runtime partitioning system for FPGA-based HW/SW image processing systems , 2007, Journal of Real-Time Image Processing.

[127]  Dimitrios Makris,et al.  A DSP-based system for the detection of vehicles parked in prohibited areas , 2007, 2007 IEEE Conference on Advanced Video and Signal Based Surveillance.

[128]  Samuel Williams,et al.  The Landscape of Parallel Computing Research: A View from Berkeley , 2006 .

[129]  Luc Van Gool,et al.  Pedestrian detection at 100 frames per second , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.