Méthode de conception rapide d'architecture massivement parallèle sur puce : de la modélisation à l'expérimentation sur FPGA. (A rapid design method of a massively parallel System on Chip: from modeling to FPGA implementation)

Les travaux presentes dans cette these s'inscrivent dans le cadre des recherches menes sur la concep- tion et implementation des systemes sur puce a hautes performances afin d'accelerer et faciliter la conception ainsi que la mise en œuvre des applications de traitement systematique a parallelisme de donnees massif. Nous definissons dans ce travail un systeme SIMD massivement parallele sur puce nomme mppSoC : massively paral- lel processing System on Chip. Ce systeme est generique et parametrique pour s'adapter a l'application. Nous proposons une demarche de conception rapide et modulaire pour mppSoC. Cette conception se base sur un assemblage de composants ou IPs. A cette fin, une bibliotheque mppSoCLib est mise en place. Le concepteur pourra directement choisir les composants necessaires et definir les parametres du systeme afin de construire une configuration SIMD repondant a ses besoins. Une chaine de generation automatisee a ete developpee. Cette chaine permet la generation automatique du code VHDL d'une configuration mppSoC modelisee a haut niveau d'abstraction (UML). Le code VHDL produit est directement simulable et synthetisable sur FPGA. Cette chaine autorise la definition a un haut niveau d'abstraction d'une configuration adequate a une application donnee. A partir de la simulation du code genere automatiquement, nous pouvons modifier la configuration dans une demarche d'exploration pour le moment semi-automatique. Nous validons mppSoC dans un contexte applicatif reel de traitement video a base de FPGA. Dans ce meme contexte, une comparaison entre mppSoC et d'autres systemes montre les performances suffisantes et l'efficacite de mppSoC.

[1]  Jean-Luc Dekeyser,et al.  FPGA Implementation of Embedded Cruise Control and Anti-Collision Radar , 2006, 9th EUROMICRO Conference on Digital System Design (DSD'06).

[2]  John R. Nickolls,et al.  The design of the MasPar MP-1: a cost effective massively parallel computer , 1990, Digest of Papers Compcon Spring '90. Thirty-Fifth IEEE Computer Society International Conference on Intellectual Leverage.

[3]  Gerard J. M. Smit,et al.  Image Quantisation on a Massively Parallel Embedded Processor , 2007, SAMOS.

[4]  R.P. Kleihorst,et al.  Xetal-II: A 107 GOPS, 600 mW Massively Parallel Processor for Video Scene Analysis , 2008, IEEE Journal of Solid-State Circuits.

[5]  Tom Blank,et al.  The MasPar MP-1 architecture , 1990, Digest of Papers Compcon Spring '90. Thirty-Fifth IEEE Computer Society International Conference on Intellectual Leverage.

[6]  César Torres-Huitzil,et al.  Real-time image processing with a compact FPGA-based systolic architecture , 2004, Real Time Imaging.

[7]  Dietmar Fey,et al.  A Programmable Parallel Processor Architecture in FPGAs for Image Processing Sensors , 2007 .

[8]  Gaetano Borriello,et al.  ipChinook: an integrated IP-based design framework for distributed embedded systems , 1999, DAC '99.

[9]  Fabrice Lemonnier,et al.  Definition and SIMD Implementation of a Multi-Processing Architecture Approach on FPGA , 2008, 2008 Design, Automation and Test in Europe.

[10]  Michael J. Flynn,et al.  Very high-speed computing systems , 1966 .

[11]  Bertil Svensson,et al.  A high-performance embedded massively parallel processing system , 1994, Proceedings of the First International Conference on Massively Parallel Computing Systems (MPCS) The Challenges of General-Purpose and Special-Purpose Computing.

[12]  Belkacem Zerrouk SMAL-X 31 une architecture parallele de granularite fine pour le traitement d'images et l'emulation neuro-mimetique , 1989 .

[13]  Xinyu Li,et al.  An Automatic Design Flow for Data Parallel and Pipelined Signal Processing Applications on Embedded Multiprocessor with NoC: Application to Cryptography , 2009, Int. J. Reconfigurable Comput..

[14]  Olivier Sentieys,et al.  Définition de mesures objectives de performances pour la mise en oeuvre parallèle d'algorithmes de traitement d'image , 1991 .

[15]  Pierre Boulet,et al.  Array-OL Revisited, Multidimensional Intensive Signal Processing Specification , 2007 .

[16]  Howard Jay Siegel,et al.  Using the multistage cube network topology in parallel supercomputers , 1989 .

[17]  Jürgen Teich,et al.  A highly parameterizable parallel processor array architecture , 2006, 2006 IEEE International Conference on Field Programmable Technology.

[18]  J. L. Basille SYMPATI's project , 1992, [Proceedings 1992] IEEE International Conference on Systems Engineering.

[19]  Henry Hoffmann,et al.  On-Chip Interconnection Architecture of the Tile Processor , 2007, IEEE Micro.

[20]  T. J. Fountain,et al.  The CLIP7A Image Processor , 1988, IEEE Trans. Pattern Anal. Mach. Intell..

[21]  Jens-Uwe Schluessler,et al.  Low power design of the X-GOLD® SDR 20 baseband processor , 2010, 2010 Design, Automation & Test in Europe Conference & Exhibition (DATE 2010).

[22]  Roberto Manduchi,et al.  Multistage sampling structure conversion of video signals , 1993, IEEE Trans. Circuits Syst. Video Technol..

[23]  Reinaldo A. Bergamaschi,et al.  Designing systems-on-chip using cores , 2000, DAC.

[24]  Karthikeyan Sankaralingam,et al.  Universal Mechanisms for Data-Parallel Architectures , 2003, MICRO.

[25]  G.E. Moore,et al.  Cramming More Components Onto Integrated Circuits , 1998, Proceedings of the IEEE.

[26]  Mohamed Abid,et al.  Study and integration of a parametric neighbouring interconnection network in a massively parallel architecture on FPGA , 2009, 2009 IEEE/ACS International Conference on Computer Systems and Applications.

[27]  A. Kalantzopoulos,et al.  An FPGA-based Digital Camera System controlled from an LCD Touch Panel , 2009, 2009 International Symposium on Signals, Circuits and Systems.

[28]  David E. Schimmel,et al.  Hiding communication latency in data parallel applications , 1998, Proceedings of the First Merged International Parallel Processing Symposium and Symposium on Parallel and Distributed Processing.

[29]  M. Abid,et al.  IP integration methodology for SoC design , 2004, Proceedings. The 16th International Conference on Microelectronics, 2004. ICM 2004..

[30]  Mohamed Abid,et al.  Reconfigurable Communication Networks in a Parametric SIMD Parallel System on Chip , 2010, ARC.

[31]  Wael M. Badawy,et al.  Towards an H.264/AVC HW/SW Integrated Solution: An Efficient VBSME Architecture , 2008, IEEE Transactions on Circuits and Systems II: Express Briefs.

[32]  Hsien-Hsin S. Lee,et al.  Chameleon: Virtualizing idle acceleration cores of a heterogeneous multicore processor for caching and prefetching , 2010, TACO.

[33]  Philip Heng Wai Leong,et al.  FPGA-based SIMD processor , 2003, 11th Annual IEEE Symposium on Field-Programmable Custom Computing Machines, 2003. FCCM 2003..

[34]  J. Eyre,et al.  The evolution of DSP processors , 2000, IEEE Signal Process. Mag..

[35]  Allan L. Fisher Scan Line Array Processors for Image Computation , 1986, ISCA.

[36]  Kei Hiraki,et al.  GRAPE-DR: 2-Pflops massively-parallel computer with 512-core, 512-Gflops processor chips for scientific computing , 2007, Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07).

[37]  W. Daniel Hillis,et al.  The connection machine , 1985 .

[38]  Edward T. Grochowski,et al.  Larrabee: A many-Core x86 architecture for visual computing , 2008, 2008 IEEE Hot Chips 20 Symposium (HCS).

[39]  Didier Juvin,et al.  08 - Symphonie calculateur massivement parallèle modélisation et réalisation , 1997 .

[40]  Marco Lanuzza,et al.  A high-performance fully reconfigurable FPGA-based 2D convolution processor , 2005, Microprocess. Microsystems.

[41]  Christoforos E. Kozyrakis,et al.  Vector vs. superscalar and VLIW architectures for embedded multimedia benchmarks , 2002, MICRO.

[42]  Rainer Leupers,et al.  A SIMD optimization framework for retargetable compilers , 2009, TACO.

[43]  David E. Schimmel,et al.  CCSIMD: A Concurrent Communication and Computation Framework for SIMD Machines , 1997, PCRCW.

[44]  Bernard Pottier,et al.  Co-Design of Massively Parallel Embedded Processor Architectures , 2005, ReCoSoC.

[45]  Thierry M. Bernard,et al.  Recherche d'une exploitation énergétique optimale des ressources de calcul dans un système de vision sur puce , 2009 .

[46]  William J. Dally,et al.  Imagine: Media Processing with Streams , 2001, IEEE Micro.

[47]  Peng Zhao,et al.  An integrated simdization framework using virtual vectors , 2005, ICS '05.

[48]  Cyrille Chavet,et al.  Synthèse automatique d'interfaces de communication matérielles pour la conception d'applications du domaine du traitement du signal. (Automatic synthesis of hardware communication interfaces for Data Signal Processing applications) , 2007 .

[49]  Daniel D. Gajski,et al.  High ― Level Synthesis: Introduction to Chip and System Design , 1992 .

[50]  Steven Derrien,et al.  Interfacing compiled FPGA programs: the MMAlpha approach , 2000, International Conference on Parallel and Distributed Processing Techniques and Applications.

[51]  Ulrich Ramacher Software-Defined Radio Prospects for Multistandard Mobile Phones , 2007, Computer.

[52]  Srihari Cadambi,et al.  A Massively Parallel FPGA-Based Coprocessor for Support Vector Machines , 2009, 2009 17th IEEE Symposium on Field Programmable Custom Computing Machines.

[53]  Jean-Luc Dekeyser,et al.  MpNoC Design: Modeling and Simulation , 2006 .

[54]  A. de Luca,et al.  SIMD architecture for image segmentation using Sobel operators implemented in FPGA technology , 2005, 2005 2nd International Conference on Electrical and Electronics Engineering.

[55]  Mohamed Abid,et al.  Scalable mpNoC for massively parallel systems - Design and implementation on FPGA , 2010, J. Syst. Archit..

[56]  S.G. Ziavras,et al.  H-SIMD machine: configurable parallel computing for matrix multiplication , 1993, 2005 International Conference on Computer Design.

[57]  Jean-Jacques Clar Développement d'applications parallèles pour un système multiprocesseur expérimental , 2002 .

[58]  Wittaya Chantamas,et al.  A Multiple Associative Computing Model to Support the Execution of Data Parallel Branches Using the Manager-worker Paradigm , 2009 .

[59]  Hsien-Hsin S. Lee,et al.  Extending Amdahl's Law for Energy-Efficient Computing in the Many-Core Era , 2008, Computer.

[60]  Shen Chih Tung,et al.  An 88-way multiprocessor within an FPGA with customizable instructions , 2004, 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings..

[61]  Mohamed Abid,et al.  The design methodology and the implementation of MPSOC based on Delta MINs on FPGA , 2009, 2009 International Conference on Microelectronics - ICM.

[62]  Yu Hen Hu,et al.  Design of a SIMD multimedia SoC platform , 2007, SoCC.

[63]  Joo-Young Kim,et al.  A 125 GOPS 583 mW Network-on-Chip Based Parallel Processor With Bio-Inspired Visual Attention Engine , 2009, IEEE Journal of Solid-State Circuits.

[64]  A. Legrand,et al.  Algorithmique Parallèle ― Cours Et Exercices Corrigés , 2003 .

[65]  Mohammed A. S. Khalid,et al.  Soft-Core Processors for Embedded Systems , 2006, 2006 International Conference on Microelectronics.

[66]  Shorin Kyo,et al.  AN EXTENDED C LANGUAGE AND A SIMD COMPILER FOR EFFICIENT IMPLEMENTATION OF IMAGE FILTERS ON MEDIA EXTENDED MICRO-PROCESSORS , 2003 .

[67]  H. H. Taylor,et al.  A MPEG encoder implementation on the Princeton Engine video supercomputer , 1993, [Proceedings] DCC `93: Data Compression Conference.

[68]  Jonathan Rose,et al.  VESPA: portable, scalable, and flexible FPGA-based vector processors , 2008, CASES '08.

[69]  Erik Lindholm,et al.  The NVIDIA GeForce 8800 GPU , 2007, 2007 IEEE Hot Chips 19 Symposium (HCS).

[70]  Shorin Kyo,et al.  IMAPCAR: A 100 GOPS In-Vehicle Vision Processor Based on 128 Ring Connected Four-Way VLIW Processing Elements , 2011, J. Signal Process. Syst..

[71]  Janak H. Patel Performance of Processor-Memory Interconnections for Multiprocessors , 1981, IEEE Transactions on Computers.

[72]  Lionel Lacassagne,et al.  Instructions SIMD flottantes 16 bits pour réduire la consommation dans les processeurs embarqués à jeux d’instructions spécialisables , 2006 .

[73]  Dominique Ginhac,et al.  An SIMD Programmable Vision Chip with High-Speed Focal Plane Image Processing , 2008, EURASIP J. Embed. Syst..