A butterfly processor-memory interconnection for a vector processing environment

Abstract A fundamental hurdle impeding the development of large N common memory multiprocessors is the performance limitation incurred in the switch connecting the processors to the memory modules. Multistage networks currently considered for this connection have a memory latency which grows like α log 2 N . For scientific computing, it is natural to look for a multiprocessor architecture that will enable the use of vector operations to mask memory latency. The problem to be overcome here is the chaotic behavior introduced by conflicts occurring in the switch. In this paper we examine the performance of the butterfly or indirect binary n -cube network in a vector processing environment. We describe a simplemodification of the standard 2 × 2 switch node used in such networks. This local modification to the switch node endows the network with a surprising global property. It adaptively removes chaotic behavior during a vector operation.