From Scilab to High Performance Embedded Multicore Systems: The ALMA Approach

The mapping process of high performance embedded applications to today's multiprocessor system on chip devices suffers from a complex tool chain and programming process. The problem here is the expression of parallelism with a pure imperative programming language which is commonly C. This traditional approach limits the mapping, partitioning and the generation of optimized parallel code, and consequently the achievable performance and power consumption of applications from different domains. The Architecture oriented paraLlelization for high performance embedded Multicore systems using scilAb (ALMA) European project aims to bridge these hurdles through the introduction and exploitation of a Scilab-based toolchain which enables the efficient mapping of applications on multiprocessor platforms from high level of abstraction. This holistic solution of the toolchain allows the complexity of both the application and the architecture to be hidden, which leads to a better acceptance, reduced development cost, and shorter time-to-market. Driven by the technology restrictions in chip design, the end of exponential growth of clock speeds, and an unavoidable increasing request of computing performance, ALMA is a fundamental step forward in the necessary introduction of novel computing paradigms and methodologies.

[1]  Seung-Soon Im,et al.  Tool interface standard (TIS) executable and linking format (ELF) specification , 1995 .

[2]  Ken Kennedy,et al.  Compiling Fortran D for MIMD distributed-memory machines , 1992, CACM.

[3]  Krzysztof Kuchcinski,et al.  Evaluation of SIMD architecture enhancement in embedded processors for MPEG-4 , 2004, Euromicro Symposium on Digital System Design, 2004. DSD 2004..

[4]  Scott A. Mahlke,et al.  Effective compiler support for predicated execution using the hyperblock , 1992, MICRO 25.

[5]  Franz Franchetti,et al.  Data Layout Transformation for Stencil Computations on Short-Vector SIMD Architectures , 2011, CC.

[6]  Joseph A. Fisher,et al.  Trace Scheduling: A Technique for Global Microcode Compaction , 1981, IEEE Transactions on Computers.

[7]  Jürgen Becker,et al.  Architecture design space exploration of run-time scalable issue-width processors , 2011, 2011 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation.

[8]  Daniel Ménard,et al.  Floating-to-Fixed-Point Conversion for Digital Signal Processors , 2006, EURASIP J. Adv. Signal Process..

[9]  Jari Nurmi,et al.  CRISP: Cutting Edge Reconfigurable ICs for Stream Processing , 2011 .

[10]  John A. Chandy,et al.  Communication Optimizations Used in the Paradigm Compiler for Distributed-Memory Multicomputers , 1994, 1994 Internatonal Conference on Parallel Processing Vol. 2.

[11]  D. Panda,et al.  Reducing Connection Memory Requirements of MPI for InfiniBand Clusters: A Message Coalescing Approach , 2007, Seventh IEEE International Symposium on Cluster Computing and the Grid (CCGrid '07).

[12]  B. Ramakrishna Rau,et al.  Iterative modulo scheduling: an algorithm for software pipelining loops , 1994, MICRO 27.

[13]  Klaus D. Müller-Glaser,et al.  MORPHEUS: Heterogeneous Reconfigurable Computing , 2007, 2007 International Conference on Field Programmable Logic and Applications.

[14]  Romuald Rocher,et al.  Analytical Fixed-Point Accuracy Evaluation in Linear Time-Invariant Systems , 2008, IEEE Transactions on Circuits and Systems I: Regular Papers.

[15]  Albert Cohen,et al.  Polyhedral-Model Guided Loop-Nest Auto-Vectorization , 2009, 2009 18th International Conference on Parallel Architectures and Compilation Techniques.

[16]  Muhammad Shafique,et al.  KAHRISMA: A Novel Hypermorphic Reconfigurable-Instruction-Set Multi-grained-Array Architecture , 2010, 2010 Design, Automation & Test in Europe Conference & Exhibition (DATE 2010).

[17]  Message Passing Interface Forum MPI: A message - passing interface standard , 1994 .

[18]  Jürgen Becker,et al.  A novel ADL-based compiler-centric software framework for reconfigurable mixed-ISA processors , 2011, 2011 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation.

[19]  Jürgen Becker,et al.  A Scalable Microarchitecture Design that Enables Dynamic Code Execution for Variable-Issue Clustered Processors , 2011, 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum.

[20]  Scott A. Mahlke,et al.  The superblock: An effective technique for VLIW and superscalar compilation , 1993, The Journal of Supercomputing.