Minimizing Execution Costs when Using Globally Distributed Cloud Services

Cloud computing is an emerging technology that allows users to utilize on-demand computation, storage, data and services from around the world. However, Cloud service providers charge users for these services. Specifically, to access data from their globally distributed storage edge servers, providers charge users depending on the user’s location and the amount of data transferred. When deploying data-intensive applications in a Cloud computing environment, optimizing the cost of transferring data to and from these edge servers is a priority, as data play the dominant role in the application’s execution. In this paper, we formulate a non-linear programming model to minimize the data retrieval and execution cost of data-intensive workflows in Clouds. Our model retrieves data from Cloud storage resources such that the amount of data transferred is inversely proportional to the communication cost. We take an example of an ‘intrusion detection’ application workflow, where the data logs are made available from globally distributed Cloud storage servers. We construct the application as a workflow and experiment with Cloud based storage and compute resources. We compare the cost of multiple executions of the workflow given by a solution of our non-linear program against that given by Amazon CloudFront’s ‘nearest’ single data source selection. Our results show a savings of three-quarters of total cost using our model.

[1]  Randy H. Katz,et al.  Above the Clouds: A Berkeley View of Cloud Computing , 2009 .

[2]  Ian Witten,et al.  Data Mining , 2000 .

[3]  Robert Grossman,et al.  Open standards and cloud computing: KDD-2009 panel report , 2009, KDD.

[4]  Radu Prodan,et al.  ASKALON: a tool set for cluster and Grid computing , 2005, Concurr. Pract. Exp..

[5]  Rajkumar Buyya,et al.  Article in Press Future Generation Computer Systems ( ) – Future Generation Computer Systems Cloud Computing and Emerging It Platforms: Vision, Hype, and Reality for Delivering Computing as the 5th Utility , 2022 .

[6]  Brian W. Kernighan,et al.  AMPL: A Modeling Language for Mathematical Programming , 1993 .

[7]  Rajkumar Buyya,et al.  A taxonomy of Data Grids for distributed data sharing, management, and processing , 2005, CSUR.

[8]  Peter Spellucci,et al.  An SQP method for general nonlinear programs using only equality constrained subproblems , 1998, Math. Program..

[9]  Tejaswi Redkar,et al.  Windows Azure Platform , 2010 .

[10]  Andreas Geppert,et al.  Market-Based Workflow Management , 1998, Int. J. Cooperative Inf. Syst..

[11]  Miron Livny,et al.  The cost of doing science on the cloud: The Montage example , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.

[12]  Rajkumar Buyya,et al.  A grid workflow environment for brain imaging analysis on distributed systems , 2009, Concurr. Comput. Pract. Exp..

[13]  Rajkumar Buyya,et al.  Cloudbus Toolkit for Market-Oriented Cloud Computing , 2009, CloudCom.

[14]  Michael Thomas,et al.  Data Intensive and Network Aware (DIANA) Grid Scheduling , 2007, Journal of Grid Computing.

[15]  Saurabh Bagchi,et al.  Collaborative intrusion detection system (CIDS): a framework for accurate and efficient IDS , 2003, 19th Annual Computer Security Applications Conference, 2003. Proceedings..

[16]  Hong Linh Truong,et al.  ASKALON: a tool set for cluster and Grid computing: Research Articles , 2005 .

[17]  Zahir Tari,et al.  MetaCDN: Harnessing 'Storage Clouds' for high performance content delivery , 2009, J. Netw. Comput. Appl..

[18]  Nathalie Furmento,et al.  ICENI Dataflow and Workflow: Composition and Scheduling in Space and Time , 2003 .

[19]  Salvatore J. Stolfo,et al.  A data mining framework for building intrusion detection models , 1999, Proceedings of the 1999 IEEE Symposium on Security and Privacy (Cat. No.99CB36344).

[20]  Richard McClatchey,et al.  Scheduling in Data Intensive and Network Aware (DIANA) Grid Environments , 2007, ArXiv.

[21]  Kotagiri Ramamohanarao,et al.  Layered Approach Using Conditional Random Fields for Intrusion Detection , 2010, IEEE Transactions on Dependable and Secure Computing.