RAIN: Towards Real-Time Core Devices Anomaly Detection Through Session Data in Cloud Network

Core devices form the critical components of the cloud network and provide service to multiple tenants simultaneously. The anomalies that happened in core devices impact network availability of a large number of users, meanwhile, lead to the degradation of cloud providers’ profits. However, direct monitoring of core devices needs to deploy massive heartbeat checking tools on numerous related components, which will be extremely laborious. In this paper, we deploy RAIN to reduce the number of devices that need to be detailed investigated for anomalies. The session traffic data among core devices and served virtual machines are utilized to conduct the analyzing. To guarantee near real-time monitoring, RAIN is designed as a two-step structure and incorporating four feature-based detection methods. RAIN has been deployed in Alibaba’s production cloud network for over 6 months and is analyzing terabytes of traffic flow metrics per day.