Efficient Topology Reconstruction via Machine Learning Based Traffic Patterns Recognition in Optically Interconnected Computing System

The traffic flows in parallel computing systems show clustered, correlative nature and the flows are always latency-sensitive. These flows have been abstracted as “Coflow” to pursue overall optimization. Concurrent Coflows on the network show very novel traffic patterns. On the other hand, multiple optical interconnection network architectures have been proposed to enable the traffic adaption topology reconstructions. Nevertheless, topology reconstruction strategies are application-agnostic, and their optimization objective of network performance cannot meet the Coflow demand. In order to exert the flexibility of optical topology to promote the performance of parallel computing application by Coflow acceleration, the traffic patterns are preferred to be well recognized and then an adaptive topology is generated accordingly. To avoid further complex, such recognition is expected to finish without prior knowledge from the application layer. Then, the topology should be reconstructed to minimize the Coflow completion time. To implement these procedures, we proposed a traffic pattern-aware topology reconstruction strategy. Our strategy first combines CNN and spectral clustering to realize the traffic patterns awareness. And then, the genetic searching algorithm is used to mind the proper topology. Based on real traffic trace from Facebook computing application, large-scale simulations have verified the efficiency of such a strategy by lowering the completion time of computing jobs. In addition, the experimental demonstration has confirmed the conclusions.

[1]  Jian Wu,et al.  Topology-aware task placement in small-world optical data center network , 2017, 2017 Opto-Electronics and Communications Conference (OECC) and Photonics Global Conference (PGC).

[2]  Robinson Piramuthu,et al.  HD-CNN: Hierarchical Deep Convolutional Neural Network for Image Classification , 2014, ArXiv.

[3]  T. S. Eugene Ng,et al.  A Tale of Two Topologies: Exploring Convertible Data Center Network Architectures with Flat-tree , 2017, SIGCOMM.

[4]  Kai Chen,et al.  Scheduling Mix-flows in Commodity Datacenters with Karuna , 2016, SIGCOMM.

[5]  Shan Zhong,et al.  Fully programmable and scalable optical switching fabric for petabyte data center. , 2015, Optics express.

[6]  Michael I. Jordan,et al.  Learning Spectral Clustering , 2003, NIPS.

[7]  Kai Chen,et al.  Neural Network Meets DCN , 2018, Proc. ACM Meas. Anal. Comput. Syst..

[8]  WuJian,et al.  Optical switching based small-world data center network , 2017 .

[9]  Jie Huang,et al.  The HiBench benchmark suite: Characterization of the MapReduce-based data analysis , 2010, 2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010).

[10]  Konstantina Papagiannaki,et al.  c-Through: part-time optics in data centers , 2010, SIGCOMM 2010.

[11]  Michael I. Jordan,et al.  Managing data transfers in computer clusters with orchestra , 2011, SIGCOMM.

[12]  Shivendra S. Panwar,et al.  HELIOS: A High Energy-efficiency Locally-scheduled Input-queued Optical Switch , 2010, 2010 ACM/IEEE Symposium on Architectures for Networking and Communications Systems (ANCS).

[13]  Wei Xu,et al.  A 12-rack, 180-server datacenter network (DCN) using multiwavelength optical switching and full stack optimization , 2016, 2016 Optical Fiber Communications Conference and Exhibition (OFC).

[14]  Anand Raghunathan,et al.  ShuffleWatcher: Shuffle-aware Scheduling in Multi-tenant MapReduce Clusters , 2014, USENIX Annual Technical Conference.

[15]  Christo Wilson,et al.  Better never than late , 2011, SIGCOMM 2011.

[16]  Gang Chen,et al.  Analysis and experimental demonstration of an optical switching enabled scalable data center network architecture , 2017, Opt. Switch. Netw..

[17]  Yanhui Geng,et al.  CODA: Toward Automatically Identifying and Scheduling Coflows in the Dark , 2016, SIGCOMM.

[18]  Alex C. Snoeren,et al.  RotorNet: A Scalable, Low-complexity, Optical Datacenter Network , 2017, SIGCOMM.

[19]  J. Edmonds Paths, Trees, and Flowers , 1965, Canadian Journal of Mathematics.

[20]  Ankit Singla,et al.  OSA: An Optical Switching Architecture for Data Center Networks With Unprecedented Flexibility , 2012, IEEE/ACM Transactions on Networking.

[21]  Elio Salvadori,et al.  Virtual topology reconfiguration in optical networks by means of cognition: Evaluation and experimental validation [invited] , 2015, IEEE/OSA Journal of Optical Communications and Networking.

[22]  Reza Nejabati,et al.  Optical flyways for handling elephant flows to improve big data performance in SDN enabled Datacenters , 2016, 2016 Optical Fiber Communications Conference and Exhibition (OFC).

[23]  Ishai Menache,et al.  Network-Aware Scheduling for Data-Parallel Jobs: Plan When You Can , 2015, Conference on Applications, Technologies, Architectures, and Protocols for Computer Communication.

[24]  Ion Stoica,et al.  Efficient coflow scheduling with Varys , 2015, SIGCOMM.

[25]  Antony I. T. Rowstron,et al.  Decentralized task-aware scheduling for data center networks , 2014, SIGCOMM.

[26]  Zili Zhang,et al.  A Hybrid Algorithm for Estimating Origin-Destination Flows , 2018, IEEE Access.

[27]  Ion Stoica,et al.  Coflow: a networking abstraction for cluster applications , 2012, HotNets-XI.