From Scilab to multicore embedded systems: Algorithms and methodologies

While advances in processor architecture continues to increase hardware parallelism, parallel software creation is hard. There is an increasing need for tools and methodologies to narrow the entry gap for non-experts in parallel software development as well as to streamline the work for experts. This paper presents the methodology and algorithms for the creation of parallel software written in Scilab source code for multicore embedded processors in the context of the “Architecture oriented paraLlelization for high performance embedded Multicore systems using scilAb” (ALMA) EU FP7 project. The ALMA parallelization approach in a nutshell attempts to manage the complexity of the task by alternating focus between very localized and holistic view program optimization strategies.

[1]  Albert Cohen,et al.  Polyhedral-Model Guided Loop-Nest Auto-Vectorization , 2009, 2009 18th International Conference on Parallel Architectures and Compilation Techniques.

[2]  Henry S. Warren,et al.  Hacker's Delight , 2002 .

[3]  Heinrich Meyr,et al.  Fast bit-true simulation , 2001, Proceedings of the 38th Design Automation Conference (IEEE Cat. No.01CH37232).

[4]  Krzysztof Kuchcinski,et al.  Evaluation of SIMD architecture enhancement in embedded processors for MPEG-4 , 2004, Euromicro Symposium on Digital System Design, 2004. DSD 2004..

[5]  Franz Franchetti,et al.  Data Layout Transformation for Stencil Computations on Short-Vector SIMD Architectures , 2011, CC.

[6]  Fred Glover,et al.  Tabu Search: A Tutorial , 1990 .

[7]  Seehyun Kim,et al.  Fixed-point optimization utility for C and C++ based digital signal processing programs , 1998 .

[8]  Seung-Soon Im,et al.  Tool interface standard (TIS) executable and linking format (ELF) specification , 1995 .

[9]  G. Dueck New optimization heuristics , 1993 .

[10]  Daniel Ménard,et al.  Floating-to-Fixed-Point Conversion for Digital Signal Processors , 2006, EURASIP J. Adv. Signal Process..

[11]  André B. J. Kokkeler,et al.  Adaptive resource allocation for streaming applications , 2011, 2011 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation.

[12]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[13]  Jari Nurmi,et al.  CRISP: Cutting Edge Reconfigurable ICs for Stream Processing , 2011 .

[14]  Paolo Faraboschi,et al.  VLIW Processors , 2011, Encyclopedia of Parallel Computing.

[15]  Romuald Rocher,et al.  Analytical Fixed-Point Accuracy Evaluation in Linear Time-Invariant Systems , 2008, IEEE Transactions on Circuits and Systems I: Regular Papers.

[16]  Michael F. P. O'Boyle,et al.  Towards a holistic approach to auto-parallelization: integrating profile-driven parallelism detection and machine-learning based mapping , 2009, PLDI '09.

[17]  Michael Hbner,et al.  Reconfigurable Computing: From FPGAs to Hardware/Software Codesign , 2011 .

[18]  F. Glover,et al.  Fundamentals of Scatter Search and Path Relinking , 2000 .

[19]  Jürgen Becker,et al.  A Scalable Microarchitecture Design that Enables Dynamic Code Execution for Variable-Issue Clustered Processors , 2011, 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum.

[20]  Jürgen Becker,et al.  A cycle-approximate, mixed-ISA simulator for the KAHRISMA architecture , 2012, 2012 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[21]  Fred W. Glover,et al.  Tabu Search - Part I , 1989, INFORMS J. Comput..

[22]  Zoran Zvonar,et al.  Recent developments in enabling technologies for software defined radio , 1999, IEEE Commun. Mag..

[23]  David R. Cheriton,et al.  Critical Path Heuristic for Automatic Parallelization , 2008 .

[24]  Edmund K. Burke,et al.  Examination timetabling using late acceptance hyper-heuristics , 2009, 2009 IEEE Congress on Evolutionary Computation.

[25]  Romuald Rocher,et al.  Analytical accuracy evaluation of fixed-point systems , 2007, 2007 15th European Signal Processing Conference.

[26]  C. D. Gelatt,et al.  Optimization by Simulated Annealing , 1983, Science.

[27]  Gerard J. M. Smit,et al.  Multicore soc for on-board payload signal processing , 2011, 2011 NASA/ESA Conference on Adaptive Hardware and Systems (AHS).

[28]  Jürgen Becker,et al.  Architecture design space exploration of run-time scalable issue-width processors , 2011, 2011 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation.

[29]  Andrew W. Appel,et al.  SSA is functional programming , 1998, SIGP.

[30]  Jürgen Becker,et al.  A Compiler Back-End for Reconfigurable, Mixed-ISA Processors with Clustered Register Files , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum.

[31]  Ladislau Bölöni,et al.  A Comparison of Eleven Static Heuristics for Mapping a Class of Independent Tasks onto Heterogeneous Distributed Computing Systems , 2001, J. Parallel Distributed Comput..

[32]  Antonino Tumeo,et al.  Mapping and scheduling of parallel C applications with Ant Colony Optimization onto heterogeneous reconfigurable MPSoCs , 2010, 2010 15th Asia and South Pacific Design Automation Conference (ASP-DAC).

[33]  R. Govindarajan,et al.  Automatic compilation of MATLAB programs for synergistic execution on heterogeneous processors , 2011, PLDI '11.

[34]  Muhammad Shafique,et al.  KAHRISMA: A Novel Hypermorphic Reconfigurable-Instruction-Set Multi-grained-Array Architecture , 2010, 2010 Design, Automation & Test in Europe Conference & Exhibition (DATE 2010).

[35]  Robert L. Bernstein Multiplication by integer constants , 1986, Softw. Pract. Exp..

[36]  Kostas Masselos,et al.  NAC : A lightweight intermediate representation for ASIP compilers , 2011 .

[37]  Dimitrios Kritharidis,et al.  From Scilab to High Performance Embedded Multicore Systems: The ALMA Approach , 2012, 2012 15th Euromicro Conference on Digital System Design.

[38]  Daniel Cordes,et al.  Automatic parallelization of embedded software using hierarchical task graphs and integer linear programming , 2010, 2010 IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS).

[39]  Jürgen Becker,et al.  A novel ADL-based compiler-centric software framework for reconfigurable mixed-ISA processors , 2011, 2011 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation.

[40]  Fred Glover,et al.  Tabu Search - Part II , 1989, INFORMS J. Comput..