Self adaptable multithreaded object detection on embedded multicore systems

Leveraging multithreading on embedded multicore platforms has been proven effective on handling the increasing resolutions of target stimuli of object detection. However, complex tradeoffs and correlated design impacts between a parallel application and the underlying multicore platform necessitate an effective and adaptable multithreaded design. This paper introduces a hybrid multithreaded object detection with high parallelism and extensive data reuse. A self adaptable flow is proposed to adjust the multithreaded object detection to fully exploit various embedded multicore architectures. The ARM-based cycle accurate simulations of multicore systems have shown the superior performance returned by the proposed design. Comprehensive design exploration for a multithreaded object detection algorithm.A Multi-Staged Classifier Grouping scheme to improve data reuse on the local cache.A self adaptable design flow to auto-tune design parameters for a multicore system.In-depth performance evaluation with an ARM-based cycle accurate simulator.

[1]  Daniel Snow,et al.  Pedestrian detection using boosted features over many frames , 2008, 2008 19th International Conference on Pattern Recognition.

[2]  Niels Henze,et al.  Gesture recognition with a Wii controller , 2008, TEI.

[3]  Robert Ulichney,et al.  Automatic red-eye detection and correction , 2002, Proceedings. International Conference on Image Processing.

[4]  Shih-Lien Lu,et al.  Novel FPGA based Haar classifier face detection algorithm acceleration , 2008, 2008 International Conference on Field Programmable Logic and Applications.

[5]  Nasser M. Nasrabadi,et al.  Pattern Recognition and Machine Learning , 2006, Technometrics.

[6]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.

[7]  Ryusuke Miyamoto,et al.  Partially Parallel Architecture for AdaBoost-Based Detection With Haar-Like Features , 2009, IEEE Transactions on Circuits and Systems for Video Technology.

[8]  Sonny Tham,et al.  Cilk vs MPI: comparing two very different parallel programming styles , 2003, 2003 International Conference on Parallel Processing, 2003. Proceedings..

[9]  Bo-Cheng Lai,et al.  Classifier Grouping to Enhance Data Locality for a Multi-threaded Object Detection Algorithm , 2011, 2011 IEEE 17th International Conference on Parallel and Distributed Systems.

[10]  Patrick Schaumont,et al.  Cooperative multithreading on embedded multiprocessor architectures enables energy-scalable design , 2005, Proceedings. 42nd Design Automation Conference, 2005..

[11]  Bradford Nichols,et al.  Pthreads programming , 1996 .

[12]  Douglas Thain,et al.  Qthreads: An API for programming with millions of lightweight threads , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.

[13]  Paul A. Viola,et al.  Rapid object detection using a boosted cascade of simple features , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[14]  Monica S. Lam,et al.  A data locality optimizing algorithm , 1991, PLDI '91.

[15]  Andreas Polze,et al.  Self-adaptive multithreaded applications: a case for dynamic aspect weaving , 2005, ARM '05.

[16]  Yu Wei,et al.  FPGA implementation of AdaBoost algorithm for detection of face biometrics , 2004, IEEE International Workshop on Biomedical Circuits and Systems, 2004..

[17]  Margaret Martonosi,et al.  Characterizing and improving the performance of Intel Threading Building Blocks , 2008, 2008 IEEE International Symposium on Workload Characterization.

[18]  D. Geer,et al.  Chip makers turn to multicore processors , 2005, Computer.

[19]  Yen-Kuang Chen,et al.  Parallelization of AdaBoost algorithm on multi-core processors , 2008, 2008 IEEE Workshop on Signal Processing Systems.

[20]  Peter Marwedel,et al.  Scratchpad memory: a design alternative for cache on-chip memory in embedded systems , 2002, Proceedings of the Tenth International Symposium on Hardware/Software Codesign. CODES 2002 (IEEE Cat. No.02TH8627).

[21]  Ming Yang,et al.  Face detection for automatic exposure control in handheld camera , 2006, Fourth IEEE International Conference on Computer Vision Systems (ICVS'06).

[22]  Minghong Lin,et al.  Heavy-traffic analysis of mean response time under Shortest Remaining Processing Time , 2011, Perform. Evaluation.

[23]  Chih-Wei Liu,et al.  Parallel object detection on multicore platforms , 2009, 2009 IEEE Workshop on Signal Processing Systems.

[24]  Björn Andersson,et al.  Fixed-priority preemptive multiprocessor scheduling: to partition or not to partition , 2000, Proceedings Seventh International Conference on Real-Time Computing Systems and Applications.

[25]  Grenville J. Armitage,et al.  A survey of techniques for internet traffic classification using machine learning , 2008, IEEE Communications Surveys & Tutorials.

[26]  Apan Qasem,et al.  Balancing Locality and Parallelism on Shared-cache Mulit-core Systems , 2009, 2009 11th IEEE International Conference on High Performance Computing and Communications.

[27]  Shinto Eguchi,et al.  Supervised image classification by contextual AdaBoost based on posteriors in neighborhoods , 2005, IEEE Transactions on Geoscience and Remote Sensing.

[28]  Narayanan Vijaykrishnan,et al.  A parallel architecture for hardware face detection , 2006, IEEE Computer Society Annual Symposium on Emerging VLSI Technologies and Architectures (ISVLSI'06).