Grey Fault Detection Method Based on Application Interference Model in Cloud Storage

The existing failure detection technology in cloud storage system mainly used to identify abnormal states of nodes by analyzing log records and the usage trend of CPU, memory, disk space and other physical resources of nodes. Despite great efforts made by researchers in the field of failure detection, there are still the gray faults have not been detected, such as the effect of memory jitter, and the system does not consider it an abnormal fault. We found that the shortcoming of current fault detection techniques was that they do not consider the performance impact between different applications in the same node. The performance interference between different applications is due to the limitation of virtualization technology for physical resource isolation. Therefore, we propose a gray failure detection method based on application scenario modeling. This method automatically analyze the performance interference between applications, and establish the relationship model between application performance interference and gray fault for the application scenarios. And then it uses the relational model to perceive environment changes of performance interference, so as to detect the node's failure location by itself. The accuracy and timeliness of the proposed method is verified in the data collected in the docker-based virtual storage cluster environment, as well as in the Google cluster data. And the method can detect a gray fault in 6.4 seconds, and have high precision.