Design of Direct Communication Facility for Many-Core Based Accelerators

A direct communication facility, called DCFA, for a many-core based cluster, whose compute node consists of many-core units connected to the host via PCI Express with Infiniband, is designed and evaluated. Because a many-core unit is a device of the PCI Express bus, it is not capable of configuring and initializing the Infiniband HCA, according to the PCI Express specification. This means that the host has to assist memory transfer between many-core units, and thus extra communication overhead is incurred. In DCFA, the internal structures of the Infiniband HCA are distributed to both the memory space of the host and that of the many-core unit. After the host CPU configures and initializes the HCA, it obtains the addresses of both the HCA and the internal structures assigned by the host. Using the information given by the host and the internal structures assigned in the many-core memory area, the many-core unit may transfer data directly between other many-core units using the HCA without host assists. The implementation of DCFA is based on the Mellanox Infiniband HCA and Intel's Knights Ferry. Preliminary results show that, for large data transfer, the latency of DCFA delivers the same performance as that of host to host data transfer.

[1]  Patrick Geoffray,et al.  OPIOM: off-processor IO with Myrinet , 2001, Proceedings First IEEE/ACM International Symposium on Cluster Computing and the Grid.

[2]  Sayantan Sur,et al.  Optimized Non-contiguous MPI Datatype Communication for GPU Clusters: Design, Implementation and Evaluation with MVAPICH2 , 2011, 2011 IEEE International Conference on Cluster Computing.