Resource management and task partitioning and scheduling on a run-time reconfigurable embedded system

There are many design challenges in the hardware-software co-design approach for performance improvement of data-intensive streaming applications with a general-purpose microprocessor and a hardware accelerator. These design challenges are mainly to prevent hardware area fragmentation to increase resource utilization, to reduce hardware reconfiguration cost and to partition and schedule the tasks between the microprocessor and the hardware accelerator efficiently for performance improvement and power savings of the applications. In this paper a modular and block based hardware configuration architecture named memory-aware run-time reconfigurable embedded system (MARTRES) is proposed for efficient resource management and performance improvement of streaming applications. Subsequently we design a task placement algorithm named hierarchical best fit ascending (HBFA) algorithm to prove that MARTRES configuration architecture is very efficient in increased resource utilization and flexible in task mapping and power savings. The time complexity of HBFA algorithm is reduced to O(n) compared to traditional Best Fit (BF) algorithm's time complexity of O(n^2), when the quality of the placement solution by HBFA is better than that of BF algorithm. Finally we design an efficient task partitioning and scheduling algorithm named balanced partitioned and placement-aware partitioning and scheduling algorithm (BPASA). In BPASA we exploit the temporal parallelism in streaming applications to reduce reconfiguration cost of the hardware, while keeping in mind the required throughput of the output data. We balance the exploitation of spatial parallelism and temporal parallelism in streaming applications by considering the reconfiguration cost vs. the data transfer cost. The scheduler refers to the HBFA placement algorithm to check whether contiguous area on FPGA is available before scheduling the task for HW or for SW.

[1]  Jui-Hung Yeh,et al.  RAMP: reconfigurable architecture and mobility platform , 2005, GLOBECOM '05. IEEE Global Telecommunications Conference, 2005..

[2]  Ronald L. Rivest,et al.  Introduction to Algorithms , 1990 .

[3]  Rudy Lauwereins,et al.  Design methodology for a tightly coupled VLIW/reconfigurable matrix architecture: a case study , 2004, Proceedings Design, Automation and Test in Europe Conference and Exhibition.

[4]  Ran Ginosar,et al.  Cost considerations in network on chip , 2004, Integr..

[5]  Scott Hauck,et al.  The Chimaera reconfigurable functional unit , 1997, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[6]  Rudy Lauwereins,et al.  ADRES: An Architecture with Tightly Coupled VLIW Processor and Coarse-Grained Reconfigurable Matrix , 2003, FPL.

[7]  Majid Sarrafzadeh,et al.  Fast Template Placement for Reconfigurable Computing Systems , 2000, IEEE Des. Test Comput..

[8]  D. T. Lee,et al.  A simple on-line bin-packing algorithm , 1985, JACM.

[9]  Carl Ebeling,et al.  Configurable computing: the catalyst for high-performance architectures , 1997, Proceedings IEEE International Conference on Application-Specific Systems, Architectures and Processors.

[10]  Anurag Tiwari,et al.  Saving power by mapping finite-state machines into embedded memory blocks in FPGAs , 2004, Proceedings Design, Automation and Test in Europe Conference and Exhibition.

[11]  George Varghese,et al.  Design Methodology of a Low-Energy Reconfigurable Single-Chip DSP System , 2001, J. VLSI Signal Process..

[12]  Clifford Stein,et al.  Introduction to Algorithms, 2nd edition. , 2001 .

[13]  Nikil D. Dutt,et al.  Efficient search space exploration for HW-SW partitioning , 2004, International Conference on Hardware/Software Codesign and System Synthesis, 2004. CODES + ISSS 2004..

[14]  John Wawrzynek,et al.  The Garp Architecture and C Compiler , 2000, Computer.

[15]  Adrian Cosoroaba Memory Interfaces Made Easy with Xilinx FPGAs and the Memory Interface Generator , 2007 .

[16]  André DeHon,et al.  MATRIX: a reconfigurable computing architecture with configurable instruction distribution and deployable resources , 1996, 1996 Proceedings IEEE Symposium on FPGAs for Custom Computing Machines.

[17]  Maya Gokhale,et al.  Reconfigurable Computing: Accelerating Computation with Field-Programmable Gate Arrays , 2005 .

[18]  Ragunathan Rajkumar,et al.  Partitioning bin-packing algorithms for distributed real-time systems , 2006, Int. J. Embed. Syst..

[19]  Jörg Henkel,et al.  Closing the SoC Design Gap , 2003, Computer.

[20]  Hossein Falaki,et al.  Hierarchical Graph: A New Cost Effective Architecture for Network on Chip , 2005, EUC.

[21]  Takanobu Watanabe,et al.  New linear-parabolic rate equation for thermal oxidation of silicon. , 2006, Physical review letters.

[22]  Kunle Olukotun,et al.  A quantitative analysis of reconfigurable coprocessors for multimedia applications , 1998, Proceedings. IEEE Symposium on FPGAs for Custom Computing Machines (Cat. No.98TB100251).

[23]  Rudy Lauwereins,et al.  Reconfigurable instruction set processors: a survey , 2000, Proceedings 11th International Workshop on Rapid System Prototyping. RSP 2000. Shortening the Path from Specification to Prototype (Cat. No.PR00668).

[24]  Nikil D. Dutt,et al.  Physically-aware HW-SW partitioning for reconfigurable architectures with partial dynamic reconfiguration , 2005, Proceedings. 42nd Design Automation Conference, 2005..

[25]  Brian W. Kernighan,et al.  An efficient heuristic procedure for partitioning graphs , 1970, Bell Syst. Tech. J..

[26]  Luca Benini,et al.  NoC synthesis flow for customized domain specific multiprocessor systems-on-chip , 2005, IEEE Transactions on Parallel and Distributed Systems.

[27]  Gerard J. M. Smit,et al.  A Flexible and Energy-Efficient Coarse-Grained Reconfigurable Architecture for Mobile Systems , 2003, The Journal of Supercomputing.

[28]  Colin Reeves,et al.  Hybrid genetic algorithms for bin-packing and related problems , 1996, Ann. Oper. Res..

[29]  Juanjo Noguera,et al.  A HW/SW partitioning algorithm for dynamically reconfigurable architectures , 2001, Proceedings Design, Automation and Test in Europe. Conference and Exhibition 2001.

[30]  Kiyoung Choi,et al.  Hardware-software cosynthesis for run-time incrementally reconfigurable FPGAs , 2000, ASP-DAC '00.

[31]  F. Boekhorst Ambient intelligence, the next paradigm for consumer electronics: how will it affect silicon? , 2002, 2002 IEEE International Solid-State Circuits Conference. Digest of Technical Papers (Cat. No.02CH37315).

[32]  David Wentzlaff,et al.  Gigabit IP Routing on Raw , 2002, HPCA 2002.

[33]  Shuvra S. Bhattacharyya,et al.  Joint application mapping/interconnect synthesis techniques for embedded chip-scale multiprocessors , 2005, IEEE Transactions on Parallel and Distributed Systems.

[34]  Marco Platzner,et al.  Fast online task placement on FPGAs: free space partitioning and 2D-hashing , 2003, Proceedings International Parallel and Distributed Processing Symposium.

[35]  Fadi J. Kurdahi,et al.  Design and Implementation of the MorphoSys Reconfigurable Computing Processor , 2000, J. VLSI Signal Process..

[36]  Harvey F. Silverman,et al.  Processor reconfiguration through instruction-set metamorphosis , 1993, Computer.

[37]  Venkatesh Akella,et al.  Synchroscalar: a multiple clock domain, power-aware, tile-based embedded processor , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..