A Probabilistic Approach for the System-Level Design of Multi-ASIP Platforms

(11/12/2018) A Probabilistic Approach for the System-Level Design of Multi-ASIP Platforms Application Specific Instruction-set Processors (ASIPs) offer a good trade off between performance and flexibility when compared to general purpose processors or ASICs. Additionally, multiple ASIPs can be included in a single platform and they allow the generation of customized heterogeneous MPSoC with a relatively short time-to-market. While there are several commercial tools for the design of a single ASIP, there is still a lack of automation in the design of multi-ASIP platforms.

[1]  Shashi Kumar,et al.  A two-step genetic algorithm for mapping task graphs to a network on chip architecture , 2003, Euromicro Symposium on Digital System Design, 2003. Proceedings..

[2]  Anshul Kumar,et al.  ASIP design methodologies: survey and issues , 2001, VLSI Design 2001. Fourteenth International Conference on VLSI Design.

[3]  Soonhoi Ha,et al.  A Systematic Design Space Exploration of MPSoC Based on Synchronous Data Flow Specification , 2010, J. Signal Process. Syst..

[4]  David G. Chinnery,et al.  Closing the Power Gap between ASIC & Custom: Tools and Techniques for Low Power Design , 2005 .

[5]  Edward A. Lee,et al.  Static Scheduling of Synchronous Data Flow Programs for Digital Signal Processing , 1989, IEEE Transactions on Computers.

[6]  Pierre Boulet,et al.  Repetitive model refactoring strategy for the design space exploration of intensive signal processing applications , 2011, J. Syst. Archit..

[7]  Menno Lindwer,et al.  Automatic architecture synthesis and application mapping for application-specific customizable MPSoCs , 2010 .

[8]  Sri Parameswaran,et al.  Synthesis of heterogeneous pipelined multiprocessor systems using ILP: jpeg case study , 2008, CODES+ISSS '08.

[9]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[10]  Hermann Kopetz,et al.  Real-time systems , 2018, CSC '73.

[11]  Martin Schoeberl,et al.  A Statically Scheduled Time-Division-Multiplexed Network-on-Chip for Real-Time Systems , 2012, 2012 IEEE/ACM Sixth International Symposium on Networks-on-Chip.

[12]  Graham Kendall,et al.  Search Methodologies: Introductory Tutorials in Optimization and Decision Support Techniques , 2013 .

[13]  Luca Benini,et al.  MPARM: Exploring the Multi-Processor SoC Design Space with SystemC , 2005, J. VLSI Signal Process..

[14]  Richard Cole,et al.  On the Benefit of Supporting Virtual Channels in Wormhole Routers , 2001, J. Comput. Syst. Sci..

[15]  Narayanan Vijaykrishnan,et al.  Variation-aware task allocation and scheduling for MPSoC , 2007, ICCAD 2007.

[16]  Ludovic Apvrille,et al.  Evaluation of ASIPs Design with LISATek , 2008, SAMOS.

[17]  J. Huisken,et al.  Ultra low power application specific instruction-set processor design for a cardiac beat detector algorithm , 2009, 2009 NORCHIP.

[18]  Menno M. Lindwer The Future of Data-Parallel Embedded Systems , 2011 .

[19]  Sander Stuijk,et al.  Throughput Analysis of Synchronous Data Flow Graphs , 2006, Sixth International Conference on Application of Concurrency to System Design (ACSD'06).

[20]  Kees G. W. Goossens,et al.  Aelite: A flit-synchronous Network on Chip with composable and predictable services , 2009, 2009 Design, Automation & Test in Europe Conference & Exhibition.

[21]  Petru Eles,et al.  Holistic scheduling and analysis of mixed time/event-triggered distributed embedded systems , 2002, Proceedings of the Tenth International Symposium on Hardware/Software Codesign. CODES 2002 (IEEE Cat. No.02TH8627).

[22]  Frank Mueller,et al.  NoCMsg: Scalable NoC-Based Message Passing , 2014, 2014 14th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing.

[23]  Albert Cohen,et al.  The Polyhedral Model Is More Widely Applicable Than You Think , 2010, CC.

[24]  Paolo Meloni,et al.  ASAM: Automatic architecture synthesis and application mapping , 2013, Microprocess. Microsystems.

[25]  Ed F. Deprettere,et al.  Systematic and Automated Multiprocessor System Design, Programming, and Implementation , 2008, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[26]  Oliver Sinnen,et al.  Task Scheduling for Parallel Systems (Wiley Series on Parallel and Distributed Computing) , 2007 .

[27]  Gerd Ascheid,et al.  FFT processor: a case study in ASIP development , 2005 .

[28]  Alberto L. Sangiovanni-Vincentelli,et al.  System-level design: orthogonalization of concerns andplatform-based design , 2000, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[29]  Amit Kumar Singh,et al.  Communication-aware heuristics for run-time task mapping on NoC-based MPSoC platforms , 2010, J. Syst. Archit..

[30]  Michael González Harbour,et al.  Schedulability analysis for tasks with static and dynamic offsets , 1998, Proceedings 19th IEEE Real-Time Systems Symposium (Cat. No.98CB36279).

[31]  Soheil Ghiasi,et al.  System-Level Performance Estimation for Application-Specific MPSoC Interconnect Synthesis , 2008, 2008 Symposium on Application Specific Processors.

[32]  Gang Wang,et al.  Application partitioning on programmable platforms using the ant colony optimization , 2006, J. Embed. Comput..

[33]  Jakob Axelsson,et al.  A method for evaluating uncertainties in the early development phases of embedded real-time systems , 2005, 11th IEEE International Conference on Embedded and Real-Time Computing Systems and Applications (RTCSA'05).

[34]  Andy D. Pimentel,et al.  A Framework for System-Level Modeling and Simulation of Embedded Systems Architectures , 2007, EURASIP J. Embed. Syst..

[35]  Kingshuk Karuri,et al.  A Generic Design Flow for Application Specific Processor Customization through Instruction-Set Extensions (ISEs) , 2009, SAMOS.

[36]  Sander Stuijk,et al.  Schedule-Extended Synchronous Dataflow Graphs , 2013, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[37]  Heinrich Meyr,et al.  Designing and Modeling MPSoC Processors and Communication Architectures , 2005 .

[38]  Mark Ewert,et al.  Hotchips 2013: Clovertrail+ Smartphone SoC platform , 2013, 2013 IEEE Hot Chips 25 Symposium (HCS).

[39]  Pier Luca Lanzi,et al.  Ant Colony Heuristic for Mapping and Scheduling Tasks and Communications on Heterogeneous Embedded Systems , 2010, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[40]  Dimitrios Kritharidis,et al.  Application of the MOSART Flow on the WiMAX (802.16e) PHY Layer , 2012 .

[41]  Darin Petkov,et al.  Automatic generation of application specific processors , 2003, CASES '03.

[42]  Rainer Leupers,et al.  A system level processor/communication co-exploration methodology for multi-processor system-on-chip platforms , 2004, Proceedings Design, Automation and Test in Europe Conference and Exhibition.

[43]  Ed F. Deprettere,et al.  Daedalus: Toward composable multimedia MP-SoC design , 2008, 2008 45th ACM/IEEE Design Automation Conference.

[44]  Amer Baghdadi,et al.  From Parallelism Levels to a Multi-ASIP Architecture for Turbo Decoding , 2009, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[45]  Luca Fanucci,et al.  Algorithmic and architectural design for real-time and power-efficient Retinex image/video processing , 2007, Journal of Real-Time Image Processing.

[46]  Amer Baghdadi,et al.  An efficient architecture model for systematic design of application-specific multiprocessor SoC , 2001, Proceedings Design, Automation and Test in Europe. Conference and Exhibition 2001.

[47]  Timo Hämäläinen,et al.  UML-based multiprocessor SoC design framework , 2006, TECS.

[48]  Kalyanmoy Deb,et al.  A Fast Elitist Non-dominated Sorting Genetic Algorithm for Multi-objective Optimisation: NSGA-II , 2000, PPSN.

[49]  Ken Kennedy,et al.  Maximizing Loop Parallelism and Improving Data Locality via Loop Fusion and Distribution , 1993, LCPC.

[50]  Chienhua Chen,et al.  Service disciplines for guaranteed performance service , 1997, Proceedings Fourth International Workshop on Real-Time Computing Systems and Applications.

[51]  Ligang Hou,et al.  Comparison Research between XY and Odd-Even Routing Algorithm of a 2-Dimension 3X3 Mesh Topology Network-on-Chip , 2009, 2009 WRI Global Congress on Intelligent Systems.

[52]  Paolo Faraboschi,et al.  Embedded Computing: A VLIW Approach to Architecture, Compilers and Tools , 2004 .

[53]  Cid C. de Souza,et al.  Efficient datapath merging for partially reconfigurable architectures , 2005, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[54]  Nigel P. Topham,et al.  Resource Sharing in Custom Instruction Set Extensions , 2008, 2008 Symposium on Application Specific Processors.

[55]  Robert A. Walker,et al.  Introduction to the Scheduling Problem , 1995, IEEE Des. Test Comput..

[56]  Gabriel A. Moreno,et al.  Statistical-Based WCET Estimation and Validation , 2009, WCET.

[57]  Kees A. Vissers,et al.  Optimized generation of data-path from C codes for FPGAs , 2005, Design, Automation and Test in Europe.

[58]  Y.-K. Kwok,et al.  Static scheduling algorithms for allocating directed task graphs to multiprocessors , 1999, CSUR.

[59]  Amer Baghdadi,et al.  Automatic generation of application-specific architectures for heterogeneous multiprocessor system-on-chip , 2001, Proceedings of the 38th Design Automation Conference (IEEE Cat. No.01CH37232).

[60]  Yajun Ha,et al.  Multiprocessor systems synthesis for multiple use-cases of multiple applications on FPGA , 2008, TODE.

[61]  Deepak Gangadharan,et al.  Multi-ASIP platform synthesis for real-time applications , 2013, 2013 8th IEEE International Symposium on Industrial Embedded Systems (SIES).

[62]  Sri Parameswaran,et al.  Multi-ASIP based parallel and scalable implementation of motion estimation kernel for high definition videos , 2011, 2011 9th IEEE Symposium on Embedded Systems for Real-Time Multimedia.

[63]  Axel Jantsch,et al.  TDM Virtual-Circuit Configuration for Network-on-Chip , 2008, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[64]  Lionel M. Ni,et al.  A survey of wormhole routing techniques in direct networks , 1993, Computer.

[65]  Kees G. W. Goossens,et al.  A TDM NoC supporting QoS, multicast, and fast connection set-up , 2012, 2012 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[66]  Rainer Leupers,et al.  A modular simulation framework for spatial and temporal task mapping onto multi-processor SoC platforms , 2005, Design, Automation and Test in Europe.

[67]  Wei Zhang,et al.  A time-predictable VLIW processor and its compiler support , 2007, Real-Time Systems.

[68]  Nigel P. Topham,et al.  Design-Space Exploration of Resource-Sharing Solutions for Custom Instruction Set Extensions , 2009, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[69]  Henk Corporaal,et al.  An FPGA Design Flow for Reconfigurable Network-Based Multi-Processor Systems on Chip , 2007, 2007 Design, Automation & Test in Europe Conference & Exhibition.

[70]  Luca Benini,et al.  NoC synthesis flow for customized domain specific multiprocessor systems-on-chip , 2005, IEEE Transactions on Parallel and Distributed Systems.

[71]  Srinivas Katkoori,et al.  A genetic algorithm for the design space exploration of datapaths during high-level synthesis , 2006, IEEE Transactions on Evolutionary Computation.

[72]  Cid C. de Souza,et al.  The datapath merging problem in reconfigurable systems: Complexity, dual bounds and heuristic evaluation , 2005, JEAL.

[73]  Liesbet Van der Perre,et al.  Early exploration for platform architecture instantiation with multi-mode application partitioning , 2013, 2013 50th ACM/EDAC/IEEE Design Automation Conference (DAC).

[74]  Sri Parameswaran,et al.  Design Methodology for Pipelined Heterogeneous Multiprocessor System , 2007, 2007 44th ACM/IEEE Design Automation Conference.

[75]  Alexandru Turjan,et al.  System design using Khan process networks: the Compaan/Laura approach , 2004, Proceedings Design, Automation and Test in Europe Conference and Exhibition.

[76]  Norbert Wehn,et al.  A scalable multi-ASIP architecture for standard compliant trellis decoding , 2011, 2011 International SoC Design Conference.

[77]  E. F. Girczyc,et al.  HAL: A Multi-Paradigm Approach to Automatic Data Path Synthesis , 1986, 23rd ACM/IEEE Design Automation Conference.

[78]  Yan Alexander Li,et al.  Estimating the execution time distribution for a task graph in a heterogeneous computing system , 1997, Proceedings Sixth Heterogeneous Computing Workshop (HCW'97).

[79]  Henk Corporaal,et al.  Exploring processor parallelism: Estimation methods and optimization strategies , 2013, 2013 IEEE 16th International Symposium on Design and Diagnostics of Electronic Circuits & Systems (DDECS).

[80]  Yang Qu,et al.  Combining UML2 Application and SystemC Platform Modelling for Performance Evaluation of Real-Time Embedded Systems , 2008, EURASIP J. Embed. Syst..

[81]  Haytham Elmiligi,et al.  Multi-objective optimization of NoC standard architectures using Genetic Algorithms , 2010, The 10th IEEE International Symposium on Signal Processing and Information Technology.

[82]  Yves Robert,et al.  Mapping Affine Loop Nests , 1996, Parallel Comput..

[83]  Dake Liu,et al.  Design of PIONEER: A case study using NoGap , 2010, 2010 Asia Pacific Conference on Postgraduate Research in Microelectronics and Electronics (PrimeAsia).

[84]  Achim Nohl,et al.  Application specific processor design: Architectures, design methods and tools , 2010, 2010 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[85]  Stewart Frederick Edgar,et al.  Estimation of worst-case execution time using statistical analysis , 2002 .

[86]  Deepak Gangadharan,et al.  Multi-ASIP platform synthesis for Event-Triggered applications with cost/performance trade-offs , 2013, 2013 IEEE 19th International Conference on Embedded and Real-Time Computing Systems and Applications.