Brick and mortar chip fabrication

While Moore's Law has advanced the semiconductor and technology industries, it has simultaneously driven up the cost of engineering a chip in a modern silicon process. The result is that fewer and fewer chips are produced in larger and larger volumes, stifling hardware diversity. This thesis introduces brick and mortar chips, which aim to obtain the benefits of Moore's Law without the financial side effects. Brick and mortar chips are made from small, pre-fabricated hardware components (called bricks) that are bonded in a designer-specified arrangement to a communication backbone chip which serves as the mortar (called the I/O cap). Our research examines several aspects of this chip manufacturing system. We develop a family of functional bricks, demonstrating a methodology for developing families that make efficient use of physical computation and communication resources. For high-performance communication between arbitrary combinations of bricks we propose a polymorphic on-chip network. This network allows a single I/O cap to be configured to implement the ideal network for any particular application. We analyze a low-cost, physical component assembly technique called fluidic self-assembly, and find that the chip production rate is intertwined with the architectural design of the components. To minimize application execution time on these partitioned chips, we develop software partitioning and mapping techniques which balance communication costs against computational resource contention. We close with a case study: an analysis of a brick and mortar implementation of a chip multiprocessor. Despite this being a highly latency sensitive design, our measurements indicate a worst case 36% average slowdown in application execution compared to a traditional, monolithic chip. Based on this, our cost analysis, and a survey of related technologies, we conclude that brick and mortar offers the best available performance for its price.

[1]  B D Ratner,et al.  Plasma polymerized N-isopropylacrylamide: synthesis and characterization of a smart thermally responsive coating. , 2001, Biomacromolecules.

[2]  William J. Dally Virtual-channel flow control , 1990, ISCA '90.

[3]  Kenji Nishida,et al.  Evaluation of a Prototype Data Flow Processor of the SIGMA-1 for Scientific Computations , 1986, ISCA.

[4]  Jiandong Fang,et al.  Wafer-level packaging based on uniquely orienting self-assembly (the DUO-SPASS processes) , 2006 .

[5]  William J. Dally,et al.  Digital systems engineering , 1998 .

[6]  Vivek Sarkar,et al.  Baring It All to Software: Raw Machines , 1997, Computer.

[7]  George M Whitesides,et al.  Template-directed self-assembly of 10-microm-sized hexagonal plates. , 2002, Journal of the American Chemical Society.

[8]  Ian Watson,et al.  The Manchester prototype dataflow computer , 1985, CACM.

[9]  Monica S. Lam,et al.  Communication optimization and code generation for distributed memory machines , 1993, PLDI '93.

[10]  Yu-Wen Tsai,et al.  Structured ASIC, evolution or revolution? , 2004, ISPD '04.

[11]  Steven Swanson,et al.  Area-Performance Trade-offs in Tiled Dataflow Architectures , 2006, 33rd International Symposium on Computer Architecture (ISCA'06).

[12]  A. Tsai,et al.  PipeRench: A virtualized programmable datapath in 0.18 micron technology , 2002, Proceedings of the IEEE 2002 Custom Integrated Circuits Conference (Cat. No.02CH37285).

[13]  Ron Wilson,et al.  Structured/platform ASIC apprentices: which platform will survive your board room? , 2005, DAC '05.

[14]  Arvind,et al.  Executing a Program on the MIT Tagged-Token Dataflow Architecture , 1990, IEEE Trans. Computers.

[15]  James R. Larus,et al.  Cache-conscious structure layout , 1999, PLDI '99.

[16]  M. Coppola,et al.  Spidergon: a novel on-chip communication network , 2004, 2004 International Symposium on System-on-Chip, 2004. Proceedings..

[17]  Partha Pratim Pande,et al.  Performance evaluation and design trade-offs for network-on-chip interconnect architectures , 2005, IEEE Transactions on Computers.

[18]  Ronald P. Manginell,et al.  Programmed Adsorption and Release of Proteins in a Microfluidic Device , 2003, Science.

[19]  Ravi Mahajan Emerging Directions For Packaging Technologies 62 Emerging Directions For Packaging Technologies , 2002 .

[20]  Jan M. Rabaey,et al.  Reconfigurable processing: the solution to low-power programmable DSP , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[21]  Balaram Sinharoy,et al.  IBM Power5 chip: a dual-core multithreaded processor , 2004, IEEE Micro.

[22]  William J. Dally,et al.  Design tradeoffs for tiled CMP on-chip networks , 2006, ICS '06.

[23]  P. Faraboschi,et al.  Lx: a technology platform for customizable VLIW embedded processing , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[24]  R. Ho,et al.  Proximity communication , 2004, IEEE Journal of Solid-State Circuits.

[25]  Edmond Chow,et al.  A Scalable Distributed Parallel Breadth-First Search Algorithm on BlueGene/L , 2005, ACM/IEEE SC 2005 Conference (SC'05).

[26]  William J. Dally,et al.  Smart Memories: a modular reconfigurable architecture , 2000, ISCA '00.

[27]  D.D. Antono,et al.  1.27Gb/s/pin 3mW/pin wireless superconnect (WSC) interface scheme , 2003, 2003 IEEE International Solid-State Circuits Conference, 2003. Digest of Technical Papers. ISSCC..

[28]  Kathryn S. McKinley,et al.  Static placement, dynamic issue (SPDI) scheduling for EDGE architectures , 2004, Proceedings. 13th International Conference on Parallel Architecture and Compilation Techniques, 2004. PACT 2004..

[29]  Behrooz Zahiri Structured ASICs: opportunities and challenges , 2003, Proceedings 21st International Conference on Computer Design.

[30]  Simon W. Moore,et al.  Low-latency virtual-channel routers for on-chip networks , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..

[31]  William J. Dally,et al.  Explaining the gap between ASIC and custom power: a custom perspective , 2005, Proceedings. 42nd Design Automation Conference, 2005..

[32]  Devereaux Conrad Chen Programmable arithmetic devices for high speed digital signal processing , 1992 .

[33]  David E. Culler,et al.  Fine-grain parallelism with minimal hardware support: a compiler-controlled threaded abstract machine , 1991, ASPLOS IV.

[34]  Woody Lichtenstein,et al.  The multiflow trace scheduling compiler , 1993, The Journal of Supercomputing.

[35]  Chau-Wen Tseng,et al.  Data transformations for eliminating conflict misses , 1998, PLDI.

[36]  Joseph John Rumpler,et al.  Optoelectronic integration using the magnetically assisted statistical assembly technique : initial magnetic characterization and process development , 2002 .

[37]  M. Heskins,et al.  Solution Properties of Poly(N-isopropylacrylamide) , 1968 .

[38]  Niraj K. Jha,et al.  Express virtual channels: towards the ideal interconnection fabric , 2007, ISCA '07.

[39]  V. G. Grafe,et al.  The Epsilon dataflow processor , 1989, ISCA '89.

[40]  T. Knight,et al.  Capacitive coupling solves the known good die problem , 1994, Proceedings of IEEE Multi-Chip Module Conference (MCMC-94).

[41]  T. Noguchi,et al.  High density and fully compatible embedded DRAM cell with 45nm CMOS technology (CMOS6) , 2005, Digest of Technical Papers. 2005 Symposium on VLSI Technology, 2005..

[42]  Milo M. K. Martin,et al.  Multifacet's general execution-driven multiprocessor simulator (GEMS) toolset , 2005, CARN.

[43]  R. Nagarajan,et al.  A design space evaluation of grid processor architectures , 2001, Proceedings. 34th ACM/IEEE International Symposium on Microarchitecture. MICRO-34.

[44]  James D. Meindl,et al.  Impact of die-to-die and within-die parameter fluctuations on the maximum clock frequency distribution for gigascale integration , 2002, IEEE J. Solid State Circuits.

[45]  Deepak D. Sherlekar Design considerations for regular fabrics , 2004, ISPD '04.

[46]  The economics of structured-and standard-cell-ASIC designs STRUCTURED ASICs OFFER COST AND PERFORMANCE THAT FALL BETWEEN FPGAs AND TRADITIONAL STANDARD-CELL ASICs . BUT THEIR INTRODUCTION HAS COMPLICATED THE CHOICE OF THE RIGHT SILICON , .

[47]  William J. Dally,et al.  Principles and Practices of Interconnection Networks , 2004 .

[48]  Paul S. Zuchowski,et al.  A hybrid ASIC and FPGA architecture , 2002, IEEE/ACM International Conference on Computer Aided Design, 2002. ICCAD 2002..

[49]  John R. Ellis,et al.  Bulldog: A Compiler for VLIW Architectures , 1986 .

[50]  David E. Culler,et al.  Monsoon: an explicit token-store architecture , 1990, [1990] Proceedings. The 17th Annual International Symposium on Computer Architecture.

[51]  Hiroshi Yasuhara,et al.  DDDP-a Distributed Data Driven Processor , 1983, ISCA '83.

[52]  Keith A. Jenkins,et al.  Development of next-generation system-on-package (SOP) technology based on silicon carriers with fine-pitch chip interconnection , 2005, IBM J. Res. Dev..

[53]  Luca Benini,et al.  Designing Application-Specific Networks on Chips with Floorplan Information , 2006, 2006 IEEE/ACM International Conference on Computer Aided Design.

[54]  Jamil A. Wakil,et al.  The evolution of build-up package technology and its design challenges , 2005, IBM J. Res. Dev..

[55]  Ken Kennedy,et al.  GIVE-N-TAKE—a balanced code placement framework , 1994, PLDI '94.

[56]  Sharad Malik,et al.  From ASIC to ASIP: the next design discontinuity , 2002, Proceedings. IEEE International Conference on Computer Design: VLSI in Computers and Processors.

[57]  Jack B. Dennis,et al.  A preliminary architecture for a basic data-flow processor , 1974, ISCA '98.

[58]  Fadi J. Kurdahi,et al.  Morphosys: case study of a reconfigurable computing system targeting multimedia applications , 2000, Proceedings 37th Design Automation Conference.

[59]  Toshitsugu Yuba,et al.  An Architecture Of A Dataflow Single Chip Processor , 1989, The 16th Annual International Symposium on Computer Architecture.

[60]  Luca Benini,et al.  ×pipesCompiler: A Tool for Instantiating Application Specific Networks on Chip , 2004, DATE.

[61]  William J. Dally,et al.  Flattened Butterfly Topology for On-Chip Networks , 2007, IEEE Comput. Archit. Lett..

[62]  Charles N. Fischer,et al.  Linear-time, optimal code scheduling for delayed-load architectures , 1991, PLDI '91.

[63]  Scott A. Mahlke,et al.  BulletProof: a defect-tolerant CMP switch architecture , 2006, The Twelfth International Symposium on High-Performance Computer Architecture, 2006..

[64]  W. Daniel Hillis,et al.  The network architecture of the Connection Machine CM-5 (extended abstract) , 1992, SPAA '92.

[65]  Zhiyuan Li,et al.  New tiling techniques to improve cache temporal locality , 1999, PLDI '99.

[66]  F. Jesús Sánchez Navarro,et al.  Instruction scheduling for clustered VLIW architectures , 2000 .

[67]  Susan J. Eggers,et al.  Improving balanced scheduling with compiler optimizations that increase instruction-level parallelism , 1995, PLDI '95.

[68]  A. L. Davis,et al.  The architecture and system method of DDM1: A recursively structured Data Driven Machine , 1978, ISCA '78.

[69]  James Michael Perkins Magnetically assisted statistical assembly of III-V heterostructures on silicon : initial process and technology development , 2002 .

[70]  Hans-Jörg Pfleiderer,et al.  Automated conversion from a LUT-based FPGA to a LUT-based MPGA with fast turnaround time , 2006, Proceedings of the Design Automation & Test in Europe Conference.

[71]  Kwang-Ting Cheng,et al.  A new bare die test methodology , 1999, Proceedings 17th IEEE VLSI Test Symposium (Cat. No.PR00146).

[72]  William J. Dally,et al.  Deadlock-Free Message Routing in Multiprocessor Interconnection Networks , 1987, IEEE Transactions on Computers.

[73]  Timothy Sherwood,et al.  Bit-split string-matching engines for intrusion detection and prevention , 2006, TACO.

[74]  Allen Newell,et al.  Computer Structures: Principles and Examples , 1983 .

[75]  Monica S. Lam,et al.  Global optimizations for parallelism and locality on scalable parallel machines , 1993, PLDI '93.

[76]  Paul D. Franzon,et al.  4 Gbps high-density AC coupled interconnection , 2002, Proceedings of the IEEE 2002 Custom Integrated Circuits Conference (Cat. No.02CH37285).

[77]  Roy L. Russo,et al.  On a Pin Versus Block Relationship For Partitions of Logic Graphs , 1971, IEEE Transactions on Computers.

[78]  George M. Whitesides,et al.  Surface tension-powered self-assembly of microstructures - the state-of-the-art , 2003 .

[79]  Thomas M. Conte,et al.  Unified assign and schedule: a new approach to scheduling for clustered register file microarchitectures , 1998, Proceedings. 31st Annual ACM/IEEE International Symposium on Microarchitecture.

[80]  Monica S. Lam,et al.  A data locality optimizing algorithm , 1991, PLDI '91.

[81]  Kenneth R. Traub,et al.  Multithreading: a revisionist view of dataflow architectures , 1991, ISCA '91.

[82]  Saman P. Amarasinghe,et al.  Convergent scheduling , 2002, 35th Annual IEEE/ACM International Symposium on Microarchitecture, 2002. (MICRO-35). Proceedings..

[83]  Jiandong Fang,et al.  Controlled multibatch self-assembly of microdevices , 2003 .

[84]  Keunmyung Lee,et al.  A bare-chip probe for high I/O, high speed testing , 1994 .

[85]  Anantha P. Chandrakasan,et al.  Low Power Digital CMOS Design , 1995 .

[86]  Carl Ebeling,et al.  Mapping applications to the RaPiD configurable architecture , 1997, Proceedings. The 5th Annual IEEE Symposium on Field-Programmable Custom Computing Machines Cat. No.97TB100186).

[87]  Ivo Bolsens Challenges and Opportunities for FPGA Platforms , 2002, FPL.

[88]  Kent Wilken,et al.  Optimal instruction scheduling using integer programming , 2000, PLDI.

[89]  Susan J. Eggers,et al.  Balanced scheduling: instruction scheduling when memory latency is uncertain , 1993, PLDI '93.

[90]  Jonathan Rose,et al.  Measuring the Gap Between FPGAs and ASICs , 2007, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[91]  Radu Marculescu,et al.  System-Level Buffer Allocation for Application-Specific Networks-on-Chip Router Design , 2006, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[92]  Xia Chen,et al.  A spatial path scheduling algorithm for EDGE architectures , 2006, ASPLOS XII.

[93]  David G. Chinnery,et al.  Closing the gap between ASIC and custom: an ASIC perspective , 2000, Proceedings 37th Design Automation Conference.

[94]  Xiaoqing Wen,et al.  VLSI Test Principles and Architectures: Design for Testability (Systems on Silicon) , 2006 .

[95]  Fredrik Larsson,et al.  Simics: A Full System Simulation Platform , 2002, Computer.

[96]  Anoop Gupta,et al.  The SPLASH-2 programs: characterization and methodological considerations , 1995, ISCA.

[97]  George Varghese,et al.  The Pleiades Architecture , 2002 .

[98]  Sharad Malik,et al.  Power-driven design of router microarchitectures in on-chip networks , 2003, Proceedings. 36th Annual IEEE/ACM International Symposium on Microarchitecture, 2003. MICRO-36..

[99]  Vivek Sarkar,et al.  Space-time scheduling of instruction-level parallelism on a raw machine , 1998, ASPLOS VIII.

[100]  H. Yeh,et al.  Fluidic self-assembly for the integration of GaAs light-emitting diodes on Si substrates , 1994, IEEE Photonics Technology Letters.

[101]  Giuseppe Desoli,et al.  Instruction Assignment for Clustered VLIW DSP Compilers: A New Approach , 1998 .

[102]  E. Ayguade,et al.  Modulo scheduling with integrated register spilling for clustered VLIW architectures , 2001, Proceedings. 34th ACM/IEEE International Symposium on Microarchitecture. MICRO-34.