Minimize Coflow Completion Time via Joint Optimization of Flow Scheduling and Processor Placement

The recent progress in big data has inspired lots of data- parallel applications deployed in the datacenters. Although how to optimize the data flow scheduling in datacenters has been extensively studied, traditional per-flow based optimizations usually do not perform well in dealing with the transferring of a collection of parallel flows, i.e., coflow. Consequently, how to schedule the coflow towards various objectives, e.g., minimizing the coflow completion time, has attracted much attention recently. We notice that existing coflow scheduling studies usually suggest a fixed destination for each coflow. Taking the advantage of virtualization technology, we argue that the destination can be flexibly placed in the cloud. Therefore, it is essential to jointly optimize the coflow scheduling and data processor placement. In this paper, we are motivated to investigate the problem of coflow completion time minimization with joint consideration of coflow scheduling and data processor placement. We first formally describe the problem into a mixed integer non-linear programming (MINLP) problem. By linearizing the MINLP, we further propose a relaxation based heuristic algorithm. Via extensive simulation studies, the high efficiency of our heuristic algorithm is validated.