Q-learning based Service Function Chaining using VNF Resource-aware Reward Model

With the advent of the 5G network era, it is required to flexibly build and manage networks to meet rapidly changing service requirements. Software-defined networking (SDN) and network function virtualization (NFV) are key technologies that enable flexible network management by transforming networks into software-based networks. Besides, NFV has the advantage of virtualizing network functions, operating those on commercial (COTS) servers, and managing the network functions dynamically. However, the numerous virtual networks and resources created by NFV can cause problems complicating network management. To solve the problems, research on managing complex NFV environments using artificial intelligence (AI) has recently attracted attention. In particular, service function chaining (SFC) is one of the essential NFV technologies, and it is required to create an efficient SFC path in dynamic networks. In this paper, we propose a method of finding optimal SFC path considering the resource utilization of virtual network function (VNF) and VNF placement by using Q-learning, one of the reinforcement learning (RL) algorithms.

[1]  Raouf Boutaba,et al.  On orchestrating virtual network functions , 2015, 2015 11th International Conference on Network and Service Management (CNSM).

[2]  Hye-Jin Ku,et al.  A Study on Reinforcement Learning based SFC Path Selection in SDN / NFV , 2017 .

[3]  Sang Il Kim,et al.  A research on dynamic service function chaining based on reinforcement learning using resource usage , 2017, 2017 Ninth International Conference on Ubiquitous and Future Networks (ICUFN).

[4]  Stanislav Lange,et al.  A Multi-objective Heuristic for the Optimization of Virtual Network Function Chain Placement , 2017, 2017 29th International Teletraffic Congress (ITC 29).

[5]  Choong Seon Hong,et al.  Congestion prevention mechanism based on Q-leaning for efficient routing in SDN , 2016, 2016 International Conference on Information Networking (ICOIN).

[6]  Gerald Tesauro,et al.  Temporal Difference Learning and TD-Gammon , 1995, J. Int. Comput. Games Assoc..

[7]  Daniel Raumer,et al.  Comparison of frameworks for high-performance packet IO , 2015, 2015 ACM/IEEE Symposium on Architectures for Networking and Communications Systems (ANCS).

[8]  Jae-Hyoung Yoo,et al.  Machine Learning based Link State Aware Service Function Chaining , 2019, 2019 20th Asia-Pacific Network Operations and Management Symposium (APNOMS).