Area-efficient instruction set synthesis for reconfigurable system-on-chip designs

Silicon compilers are often used in conjunction with Field Programmable Gate Arrays (FPGAs) to deliver flexibility, fast prototyping, and accelerated time-to-market. Many of these compilers produce hardware that is larger than necessary, as they do not allow instructions to share hardware resources. This study presents an efficient heuristic which transforms a set of custom instructions into a single hardware datapath on which they can execute. Our approach is based on the classic problems of finding the longest common subsequence and substring of two (or more) sequences. This heuristic produces circuits which are as much as 85.33% smaller than those synthesized by integer linear programming (ILP) approaches which do not explore resource sharing. On average, we obtained 55.41% area reduction for pipelined datapaths, and 66.92% area reduction for VLIW datapaths. Our solution is simple and effective, and can easily be integrated into an existing silicon compiler.

[1]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[2]  Scott Mahlke,et al.  Processor acceleration through automated instruction set customization , 2003, Proceedings. 36th Annual IEEE/ACM International Symposium on Microarchitecture, 2003. MICRO-36..

[3]  Sharad Malik,et al.  Managing dynamic reconfiguration overhead in systems-on-a-chip design using reconfigurable datapaths and optimized interconnection networks , 2001, Proceedings Design, Automation and Test in Europe. Conference and Exhibition 2001.

[4]  Srivaths Ravi,et al.  Synthesis of custom processors based on extensible platforms , 2002, ICCAD 2002.

[5]  Mike Paterson,et al.  A Faster Algorithm Computing String Edit Distances , 1980, J. Comput. Syst. Sci..

[6]  Scott A. Mahlke,et al.  Processor Acceleration Through Automated Instruction Set Customization , 2003, MICRO.

[7]  S. Cadambi,et al.  CPR: a configuration profiling tool , 1999, Seventh Annual IEEE Symposium on Field-Programmable Custom Computing Machines (Cat. No.PR00375).

[8]  M. Crochemore,et al.  On-line construction of suffix trees , 2002 .

[9]  Horst Bunke,et al.  Graph Clustering Using the Weighted Minimum Common Supergraph , 2003, GbRPR.

[10]  Horst Bunke,et al.  Weighted minimum common supergraph for cluster representation , 2003, Proceedings 2003 International Conference on Image Processing (Cat. No.03CH37429).

[11]  Sharad Malik,et al.  Datapath merging and interconnection sharing for reconfigurable architectures , 2002, 15th International Symposium on System Synthesis, 2002..

[12]  Miodrag Potkonjak,et al.  MediaBench: a tool for evaluating and synthesizing multimedia and communications systems , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[13]  Majid Sarrafzadeh,et al.  Instruction generation for hybrid reconfigurable systems , 2001, IEEE/ACM International Conference on Computer Aided Design. ICCAD 2001. IEEE/ACM Digest of Technical Papers (Cat. No.01CH37281).

[14]  Srivaths Ravi,et al.  A Scalable Application-Specific Processor Synthesis Methodology , 2003, ICCAD 2003.

[15]  Vijay V. Vazirani,et al.  Efficient Sequential and Parallel Algorithms for Maximal Bipartite Sets , 1993, J. Algorithms.

[16]  Majid Sarrafzadeh,et al.  Instruction generation and regularity extraction for reconfigurable processors , 2002, CASES '02.

[17]  Sri Parameswaran,et al.  INSIDE: INstruction Selection/Identification & Design Exploration for extensible processors , 2003, ICCAD-2003. International Conference on Computer Aided Design (IEEE Cat. No.03CH37486).

[18]  Darin Petkov,et al.  Automatic generation of application specific processors , 2003, CASES '03.

[19]  Esko Ukkonen,et al.  On-line construction of suffix trees , 1995, Algorithmica.

[20]  Hugo De Man,et al.  A specification invariant technique for regularity improvement between flow-graph clusters , 1996, Proceedings ED&TC European Design and Test Conference.