A GPU-based tabu search for very large hardware/software partitioning with limited resource usage

In hardware/software (HW/SW) co-design, HW/SW partitioning is the most important step since it determines which components are implemented in hardware and which are implemented in software. Since most of HW/SW partitioning problems are NP hard, heuristic methods have to be utilized to solve them, especially for the large size problems. GPU-based heuristic methods to accelerate HW/SW co-design are a promising way to reduce run time. However, the existing methods cannot deal with very large embedded applications because of GPU resource limitations. This paper presents a method to overcome the GPU resource limitations for very large partitioning while keeping a reasonable runtime. First, at the stage of computing the costs of the candidates, we propose a fast method of 2-flipping computing for very large HW/SW co-design. Our method is also general and can deal with both odd and even numbers of nodes. More importantly, our method avoids utilizing doubleprecision arithmetic units, which are scarce resources in GPU architecture. Second, since the GPU is constrained by memory limitations and the costs of candidates cannot be directly stored in the GPU’s global memory, we present a time-space tradeoff strategy to break memory limitations for very large HW/SW partitioning. In this way, the following steps can be run under the constraint of GPU’s memory limitations. Third, an in-place removal of infeasible solutions is proposed to reduce the overhead of global memory by half when the neighborhood is compacted. Fourth, when evaluating the tabu status of feasible candidates, we present a bitwise representation of tabu status to minimize the transfer overhead. Finally, we conduct a number of experiments. The results show that the proposed 2-flipping method of single precision data types works well. The results also demonstrate that the proposed approach expands the number of nodes of the task graph from 10,000 to 30,000 under the limitation of the GPU’s global memory of 6 GB. The correlations between compression intensity and solution quality are analyzed to ensure the fairness and soundness of our method. Our work is general and can provide guidance for other applications.

[1]  Ke Ding,et al.  A Survey on GPU-Based Implementation of Swarm Intelligence Algorithms , 2016, IEEE Transactions on Cybernetics.

[2]  Fazhi He,et al.  A correlative classifiers approach based on particle filter and sample set for tracking occluded target , 2017 .

[3]  Jürgen Teich,et al.  Hardware/Software Codesign: The Past, the Present, and Predicting the Future , 2012, Proceedings of the IEEE.

[4]  Wu Jigang,et al.  Efficient heuristic and tabu search for hardware/software partitioning , 2013, The Journal of Supercomputing.

[5]  Bin Li,et al.  A hardware/software partitioning algorithm based on artificial immune principles , 2008, Appl. Soft Comput..

[6]  Lucas C. Cordeiro,et al.  Applying SMT-based verification to hardware/software partitioning in embedded systems , 2016, Des. Autom. Embed. Syst..

[7]  Xiao Chen,et al.  Real-time object tracking via compressive feature selection , 2016, Frontiers of Computer Science.

[8]  Y. Karuno,et al.  Heuristic algorithms with rounded weights for a combinatorial food packing problem , 2017 .

[9]  Jing Liu,et al.  Hardware/Software Partitioning for Heterogenous MPSoC Considering Communication Overhead , 2017, International Journal of Parallel Programming.

[10]  Shuming Gao,et al.  A sketch-based semantic retrieval approach for 3D CAD models , 2017, Applied Mathematics-A Journal of Chinese Universities.

[11]  Xiuping Liu,et al.  Laplace operator based multi-channel image filters learning , 2016, Journal of Advanced Mechanical Design, Systems, and Manufacturing.

[12]  Yiqi Wu,et al.  A local start search algorithm to compute exact Hausdorff Distance for arbitrary point sets , 2017, Pattern Recognit..

[13]  Juan Carlos López,et al.  On the hardware-software partitioning problem: System modeling and partitioning techniques , 2003, TODE.

[14]  Gang Wang,et al.  Application partitioning on programmable platforms using the ant colony optimization , 2006, J. Embed. Comput..

[15]  Mohamed B. Abdelhalim,et al.  An integrated high-level hardware/software partitioning methodology , 2011, Des. Autom. Embed. Syst..

[16]  Theerayod Wiangtong,et al.  Comparing Three Heuristic Search Methods for Functional Partitioning in Hardware–Software Codesign , 2002, Des. Autom. Embed. Syst..

[17]  Masatomo Inui,et al.  GPU-based visualization of knee-form contact area for safety inspections , 2016 .

[18]  Fazhi He,et al.  Service-Oriented Feature-Based Data Exchange for Cloud-Based Design and Manufacturing , 2018, IEEE Transactions on Services Computing.

[19]  Pier Luca Lanzi,et al.  Ant Colony Optimization for mapping, scheduling and placing in reconfigurable systems , 2013, 2013 NASA/ESA Conference on Adaptive Hardware and Systems (AHS-2013).

[20]  Keijiro Masui,et al.  An application of graph traversal algorithm to design task planning in model-based product development , 2016 .

[21]  Joo-Haeng Lee,et al.  Noiseless GPU rendering of isotropic BRDF surfaces , 2011, The Visual Computer.

[22]  Masatomo Inui,et al.  Fast Detection of Head Colliding Shapes on Automobile Parts , 2013 .

[23]  Zoltán Ádám Mann,et al.  Algorithmic aspects of hardware/software partitioning , 2005, TODE.

[24]  Shin Usuki,et al.  Velocity calculation of 2D geometric objects by use of surface interpolation in 3D , 2014 .

[25]  Jun Sun,et al.  A multiple template approach for robust tracking of fast motion target , 2016, Applied Mathematics-A Journal of Chinese Universities.

[26]  P. Arato,et al.  Hardware-software partitioning in embedded system design , 2003, IEEE International Symposium on Intelligent Signal Processing, 2003.

[27]  Xiao Pan,et al.  Parsing main structures of indoor scenes from single RGB-D image , 2016 .

[28]  Yi Zhou,et al.  Parallel ant colony optimization on multi-core SIMD CPUs , 2018, Future Gener. Comput. Syst..

[29]  Bruno Arnaldi,et al.  Fast Collision Culling in Large-Scale Environments Using GPU Mapping Function , 2012, EGPGV@Eurographics.

[30]  Fazhi He,et al.  Quantitative optimization of interoperability during feature-based data exchange , 2015, Integr. Comput. Aided Eng..

[31]  Soonhung Han,et al.  An efficient approach to directly compute the exact Hausdorff distance for 3D point sets , 2017, Integr. Comput. Aided Eng..

[32]  Yi Zhou,et al.  An adaptive neighborhood taboo search on GPU for Hardware/Software Co-design , 2016, 2016 IEEE 20th International Conference on Computer Supported Cooperative Work in Design (CSCWD).

[33]  Yi Zhou,et al.  Optimization of parallel iterated local search algorithms on graphics processing unit , 2016, The Journal of Supercomputing.

[34]  Nithiyanantham Janakiraman,et al.  Multi-objective module partitioning design for dynamic and partial reconfigurable system-on-chip using genetic algorithm , 2014, J. Syst. Archit..

[35]  Fazhi He,et al.  Using shapes correlation for active contour segmentation of uterine fibroid ultrasound images in computer-aided therapy , 2016 .

[36]  El-Ghazali Talbi,et al.  GPU Computing for Parallel Local Search Metaheuristic Algorithms , 2013, IEEE Transactions on Computers.

[37]  Mouloud Koudil,et al.  Using artificial bees to solve partitioning and scheduling problems in codesign , 2007, Appl. Math. Comput..

[38]  Hong-Seok Park,et al.  Multi-objective optimization of turning process for hardened material based on hybrid approach , 2016 .

[39]  Kamil Rocki,et al.  Accelerating 2-opt and 3-opt local search using GPU in the travelling salesman problem , 2012, HPCS.

[40]  Byung Chul Kim,et al.  CAD model simplification using feature simplifications , 2016 .

[41]  Ming Li,et al.  An ontology-based semantic retrieval approach for heterogeneous 3D CAD models , 2016, Adv. Eng. Informatics.

[42]  Tatsuhiko Sakaguchi,et al.  Parallel computing for huge scale logistics optimization through binary PSO associated with topological comparison , 2014 .

[43]  Fazhi He,et al.  A Novel Hardware/Software Partitioning Method Based on Position Disturbed Particle Swarm Optimization with Invasive Weed Optimization , 2017, Journal of Computer Science and Technology.

[44]  Cong Wang,et al.  HARDWARE/SOFTWARE PARTITIONING ALGORITHM BASED ON THE COMBINATION OF GENETIC ALGORITHM AND TABU SEARCH , 2014 .

[45]  Yuan Cheng,et al.  A string-wise CRDT algorithm for smart and large-scale collaborative editing systems , 2017, Adv. Eng. Informatics.

[46]  Allan Borodin,et al.  A time-space tradeoff for sorting on a general sequential model of computation , 1980, STOC '80.

[47]  Yutaka Nomaguchi,et al.  Planning method of creative and collaborative design process with prediction model of technical performance and product integrity , 2012, Concurr. Eng. Res. Appl..

[48]  M. Montaz Ali,et al.  A Tabu Search-Based Memetic Algorithm for Hardware/Software Partitioning , 2014 .

[49]  Wu Jigang,et al.  Algorithmic Aspects of Hardware/Software Partitioning: 1D Search Algorithms , 2010, IEEE Transactions on Computers.

[50]  Jörg Henkel,et al.  An approach to automated hardware/software partitioning using a flexible granularity that is driven by high-level estimation techniques , 2001, IEEE Trans. Very Large Scale Integr. Syst..

[51]  Fazhi He,et al.  An Efficient Particle Swarm Optimization for Large-Scale Hardware/Software Co-Design System , 2017, Int. J. Cooperative Inf. Syst..

[52]  Yu Jiang,et al.  Uncertain Model and Algorithm for Hardware/Software Partitioning , 2012, 2012 IEEE Computer Society Annual Symposium on VLSI.

[53]  Yuan Cheng,et al.  Meta-operation conflict resolution for human–human interaction in collaborative feature-based CAD systems , 2016, Cluster Computing.

[54]  Kang Li,et al.  Robust Visual Tracking Based on Convolutional Features with Illumination and Occlusion Handing , 2018, Journal of Computer Science and Technology.

[55]  Yi Zhou,et al.  Dynamic strategy based parallel ant colony optimization on GPUs for TSPs , 2017, Science China Information Sciences.