Early Detection of Fraud Storms in the Cloud

Cloud computing resources are sometimes hijacked for fraudulent use. While some fraudulent use manifests as a small-scale resource consumption, a more serious type of fraud is that of fraud storms, which are events of large-scale fraudulent use. These events begin when fraudulent users discover new vulnerabilities in the sign up process, which they then exploit in mass. The ability to perform early detection of these storms is a critical component of any cloud-based public computing system. In this work we analyze telemetry data from Microsoft Azure to detect fraud storms and raise early alerts on sudden increases in fraudulent use. The use of machine learning approaches to identify such anomalous events involves two inherent challenges: the scarcity of these events, and at the same time, the high frequency of anomalous events in cloud systems. We compare the performance of a supervised approach to the one achieved by an unsupervised, multivariate anomaly detection framework. We further evaluate the system performance taking into account practical considerations of robustness in the presence of missing values, and minimization of the model's data collection period. This paper describes the system, as well as the underlying machine learning algorithms applied. A beta version of the system is deployed and used to continuously control fraud levels in Azure.

[1]  Vanish Talwar,et al.  Online detection of utility cloud anomalies using metric distributions , 2010, 2010 IEEE Network Operations and Management Symposium - NOMS 2010.

[2]  Qian Zhu,et al.  Automatic Fault Diagnosis in Cloud Infrastructure , 2013, 2013 IEEE 5th International Conference on Cloud Computing Technology and Science.

[3]  Christos Faloutsos,et al.  DynaMMo: mining and summarization of coevolving sequences with missing values , 2009, KDD.

[4]  Rebecca Willett,et al.  Change-Point Detection for High-Dimensional Time Series With Missing Data , 2012, IEEE Journal of Selected Topics in Signal Processing.

[5]  Rajkumar Buyya,et al.  Energy Efficient Resource Management in Virtualized Cloud Data Centers , 2010, 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing.

[6]  A. Genz,et al.  Computation of Multivariate Normal and t Probabilities , 2009 .

[7]  Kanishka Bhaduri,et al.  Detecting Abnormal Machine Characteristics in Cloud Infrastructures , 2011, 2011 IEEE 11th International Conference on Data Mining Workshops.

[8]  Mohammad Kazem Akbari,et al.  Using of Machine Learning into Cloud Environment (A Survey): Managing and Scheduling of Resources in Cloud Systems , 2012, 2012 Seventh International Conference on P2P, Parallel, Grid, Cloud and Internet Computing.

[9]  P. Rousseeuw,et al.  A fast algorithm for the minimum covariance determinant estimator , 1999 .

[10]  Технология Springer Science+Business Media , 2013 .

[11]  Muttukrishnan Rajarajan,et al.  A survey of intrusion detection techniques in Cloud , 2013, J. Netw. Comput. Appl..

[12]  Daan Broeder,et al.  A data infrastructure reference model with applications: towards realization of a ScienceTube vision with a data replication service , 2013, Journal of Internet Services and Applications.

[13]  Charu C. Aggarwal,et al.  Outlier Analysis , 2013, Springer New York.

[14]  Rajkumar Buyya,et al.  Energy-aware resource allocation heuristics for efficient management of data centers for Cloud computing , 2012, Future Gener. Comput. Syst..

[15]  Vanish Talwar,et al.  Statistical techniques for online anomaly detection in data centers , 2011, 12th IFIP/IEEE International Symposium on Integrated Network Management (IM 2011) and Workshops.

[16]  Mário M. Freire,et al.  Security issues in cloud environments: a survey , 2014, International Journal of Information Security.

[17]  Mourad Khayati,et al.  Missing Value Imputation in Time Series Using Top-k Case Matching , 2014, Grundlagen von Datenbanken.

[18]  Xiaohui Gu,et al.  UBL: unsupervised behavior learning for predicting performance anomalies in virtualized cloud systems , 2012, ICAC '12.

[19]  Arun Kejariwal,et al.  A Novel Technique for Long-Term Anomaly Detection in the Cloud , 2014, HotCloud.

[20]  Hubert Gatignon Multivariate Normal Distribution , 2010 .

[21]  Gianluca Bontempi,et al.  Machine Learning Strategies for Time Series Forecasting , 2012, eBISS.

[22]  VARUN CHANDOLA,et al.  Anomaly detection: A survey , 2009, CSUR.

[23]  Eduardo B. Fernández,et al.  An analysis of security issues for cloud computing , 2013, Journal of Internet Services and Applications.