OPLE: A Heuristic Custom Instruction Selection Algorithm Based on Partitioning and Local Exploration of Application Dataflow Graphs

In this article, a heuristic custom instruction (CI) selection algorithm is presented. The proposed algorithm, which is called OPLE for “Optimization based on Partitioning and Local Exploration,” uses a combination of greedy and optimal optimization methods. It searches for the near-optimal solution by reducing the search space based on partitioning the identified CI set. The partitioning of the identified set guarantees the success of the algorithm independent of the size of the identified set. First, the algorithm finds the near-optimal CIs from the candidate CIs for each part. Next, the suggested CIs from different parts are combined to determine the final selected CI set. To improve the set of the selected CIs, the solution is evolved by calling the algorithm iteratively. The efficacy of the algorithm is assessed by comparing its performance to those of optimal and nonoptimal methods. A comparative study is performed for a number of benchmarks under different area budgets and I/O constraints. The results reveal higher speedups for the OPLE algorithm, especially for larger identified candidate sets and/or small area budgets compared to those of the nonoptimal solutions. Compared to the nonoptimal techniques, the proposed algorithm provides 30% higher speedup improvement on average. The maximum improvement is 117%. The results also demonstrate that in many cases OPLE is able to find the optimal solution.

[1]  Anshul Kumar,et al.  Exhaustive Enumeration of Legal Custom Instructions for Extensible Processors , 2008, 21st International Conference on VLSI Design (VLSID 2008).

[2]  Kingshuk Karuri,et al.  Increasing data-bandwidth to instruction-set extensions through register clustering , 2007, ICCAD 2007.

[3]  Paolo Bonzini,et al.  Polynomial-Time Subgraph Enumeration for Automated Instruction Set Extension , 2007, 2007 Design, Automation & Test in Europe Conference & Exhibition.

[4]  Scott A. Mahlke,et al.  Automated custom instruction generation for domain-specific processor acceleration , 2005, IEEE Transactions on Computers.

[5]  Sharad Malik,et al.  From ASIC to ASIP: the next design discontinuity , 2002, Proceedings. IEEE International Conference on Computer Design: VLSI in Computers and Processors.

[6]  Trevor Mudge,et al.  MiBench: A free, commercially representative embedded benchmark suite , 2001 .

[7]  Tulika Mitra,et al.  Characterizing embedded applications for instruction-set extensible processors , 2004, Proceedings. 41st Design Automation Conference, 2004..

[8]  Thambipillai Srikanthan,et al.  Selecting Profitable Custom Instructions for Area–Time-Efficient Realization on Reconfigurable Architectures , 2009, IEEE Transactions on Industrial Electronics.

[9]  Rainer Leupers,et al.  A retargetable framework for compiler/architecture co-development , 2011, Des. Autom. Embed. Syst..

[10]  R. M. Mattheyses,et al.  A Linear-Time Heuristic for Improving Network Partitions , 1982, 19th Design Automation Conference.

[11]  Wayne Luk,et al.  FISH: Fast Instruction SyntHesis for Custom Processors , 2012, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[12]  Paolo Ienne,et al.  Rethinking custom ISE identification: a new processor-agnostic method , 2007, CASES '07.

[13]  Srinivas Devadas,et al.  Solving Covering Problems Using LPR-based Lower Bounds , 1997, Proceedings of the 34th Design Automation Conference.

[14]  Giovanni De Micheli,et al.  Automatic instruction set extension and utilization for embedded processors , 2003, Proceedings IEEE International Conference on Application-Specific Systems, Architectures, and Processors. ASAP 2003.

[15]  Kingshuk Karuri,et al.  Increasing data-bandwidth to instruction-set extensions through register clustering , 2007, 2007 IEEE/ACM International Conference on Computer-Aided Design.

[16]  Emmanuel Casseau,et al.  An efficient algorithm for custom instruction enumeration , 2011, GLSVLSI '11.

[17]  Ricardo E. Gonzalez,et al.  Xtensa: A Configurable and Extensible Processor , 2000, IEEE Micro.

[18]  Scott A. Mahlke,et al.  Scalable subgraph mapping for acyclic computation accelerators , 2006, CASES '06.

[19]  Paolo Ienne,et al.  Exact and approximate algorithms for the extension of embedded processor instruction sets , 2006, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[20]  Koen Bertels,et al.  The Instruction-Set Extension Problem: A Survey , 2008, TRETS.

[21]  Tilman Wolf,et al.  PacketBench: a tool for workload characterization of network processing , 2003, 2003 IEEE International Conference on Communications (Cat. No.03CH37441).

[22]  Masoud Dehyadegari,et al.  Dual-purpose custom instruction identification algorithm based on Particle Swarm Optimization , 2010, ASAP 2010 - 21st IEEE International Conference on Application-specific Systems, Architectures and Processors.

[23]  Mehdi Kamal,et al.  Timing variation-aware custom instruction extension technique , 2011, 2011 Design, Automation & Test in Europe.

[24]  Miodrag Potkonjak,et al.  MediaBench: a tool for evaluating and synthesizing multimedia and communications systems , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[25]  Tao Li,et al.  Efficient Heuristic Algorithm for Rapid Custom-Instruction Selection , 2009, 2009 Eighth IEEE/ACIS International Conference on Computer and Information Science.

[26]  Nikil D. Dutt,et al.  Introduction of Architecturally Visible Storage in Instruction Set Extensions , 2007, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[27]  Douglas C. Schmidt,et al.  A Fast Backtracking Algorithm to Test Directed Graphs for Isomorphism Using Distance Matrices , 1976, J. ACM.

[28]  Paolo Bonzini,et al.  Recurrence-Aware Instruction Set Selection for Extensible Embedded Processors , 2008, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[29]  Joseph Reddington,et al.  Complexity of Computing Convex Subgraphs in Custom Instruction Synthesis , 2012, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[30]  Paolo Ienne,et al.  Fast, Nearly Optimal ISE Identification With I/O Serialization Through Maximal Clique Enumeration , 2010, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[31]  Wayne Luk,et al.  CHIPS: Custom Hardware Instruction Processor Synthesis , 2008, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[32]  Li Shen,et al.  Optimal subgraph covering for customisable VLIW processors , 2009, IET Comput. Digit. Tech..

[33]  Anshul Kumar,et al.  Instruction Selection in ASIP Synthesis Using Functional Matching , 2010, 2010 23rd International Conference on VLSI Design.