A cloud computing system in windows azure platform for data analysis of crystalline materials

Cloud computing is attracting the attention of the scientific community. In this paper, we develop a new cloud‐based computing system in the Windows Azure platform that allows users to use the Zeolite Structure Predictor (ZSP) model through a Web browser. The ZSP is a novel machine learning approach for classifying zeolite crystals according to their framework type. The ZSP can categorize entries from the Inorganic Crystal Structure Database into 41 framework types. The novel automated system permits a user to calculate the vector of descriptors used by ZSP and to apply the model using the Random Forest™ algorithm for classifying the input zeolite entries. The workflow presented here integrates executables in Fortran and Python for number crunching with packages such as Weka for data analytics and Jmol for Web‐based atomistic visualization in an interactive compute system accessed through the Web. The compute system is robust and easy to use. Communities of scientists, engineers, and students knowledgeable in Windows‐based computing should find this new workflow attractive and easy to be implemented in scientific scenarios in which the developer needs to combine heterogeneous components. Copyright © 2012 John Wiley & Sons, Ltd.

[1]  Wei Lu,et al.  AzureBlast: a case study of developing science applications on the cloud , 2010, HPDC '10.

[2]  I. D. Brown,et al.  INORGANIC CRYSTAL STRUCTURE DATABASE , 1981 .

[3]  Zhoujun Li,et al.  An Integrated Approach to Automatic Management of Virtualized Resources in Cloud Environments , 2011, Comput. J..

[4]  Ewa Deelman,et al.  Experiences using cloud computing for a scientific workflow application , 2011, ScienceCloud '11.

[5]  Iosif I. Vaisman,et al.  Machine learning approach for structure-based zeolite classification , 2009 .

[6]  Iosif I. Vaisman,et al.  Identifying Zeolite Frameworks with a Machine Learning Approach , 2009 .

[7]  Paul Watson,et al.  e‐Science Central for CARMEN: science as a service , 2010, Concurr. Comput. Pract. Exp..

[8]  Xiao Liu,et al.  A data placement strategy in scientific cloud workflows , 2010, Future Gener. Comput. Syst..

[9]  Jianwei Yin,et al.  Cloud computing oriented network operating system and service platform , 2011, 2011 IEEE International Conference on Pervasive Computing and Communications Workshops (PERCOM Workshops).

[10]  Ju Wang,et al.  Windows Azure Storage: a highly available cloud storage service with strong consistency , 2011, SOSP.

[11]  Jiankun Hu,et al.  Correlation Keystroke Verification Scheme for User Access Control in Cloud Computing Environment , 2011, Comput. J..

[12]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[13]  Iosif I. Vaisman,et al.  Framework-Type Determination for Zeolite Structures in the Inorganic Crystal Structure Database , 2010 .

[14]  Karim Djemame,et al.  Performance Issues in Clouds: An Evaluation of Virtual Image Propagation and I/O Paravirtualization , 2011, Comput. J..

[15]  Alexandru Iosup,et al.  Performance Analysis of Cloud Computing Services for Many-Tasks Scientific Computing , 2011, IEEE Transactions on Parallel and Distributed Systems.

[16]  Xi Chen,et al.  Cold chain logistics system based on cloud computing , 2012, Concurr. Comput. Pract. Exp..

[17]  Vijay Varadharajan,et al.  Enforcing Role-Based Access Control for Secure Data Storage in the Cloud , 2011, Comput. J..

[18]  Bofeng Zhang,et al.  Comparison of Several Cloud Computing Platforms , 2009, 2009 Second International Symposium on Information Science and Engineering.

[19]  Jie Li,et al.  eScience in the cloud: A MODIS satellite data reprojection and reduction pipeline in the Windows Azure platform , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).

[20]  Yoichi Muraoka,et al.  HPC Benchmarks on Amazon EC2 , 2010, 2010 IEEE 24th International Conference on Advanced Information Networking and Applications Workshops.

[21]  Todd King,et al.  An internationally distributed cloud for science: the cloud-enabled space weather platform , 2011, SECLOUD '11.

[22]  Marta Mattoso,et al.  An adaptive parallel execution strategy for cloud‐based scientific workflows , 2012, Concurr. Comput. Pract. Exp..

[23]  Edward Walker,et al.  Benchmarking Amazon EC2 for High-Performance Scientific Computing , 2008, login Usenix Mag..

[24]  Geoffrey C. Fox,et al.  Cloud computing paradigms for pleasingly parallel biomedical applications , 2010, HPDC '10.

[25]  Randy H. Katz,et al.  A view of cloud computing , 2010, CACM.

[26]  Eugene Ciurana,et al.  Developing with Google App Engine , 2009 .

[27]  Rajkumar Buyya,et al.  Article in Press Future Generation Computer Systems ( ) – Future Generation Computer Systems Cloud Computing and Emerging It Platforms: Vision, Hype, and Reality for Delivering Computing as the 5th Utility , 2022 .

[28]  Xiao Liu,et al.  Concurrency and Computation: Practice and Experience a Data Dependency Based Strategy for Intermediate Data Storage in Scientific Cloud Workflow Systems ‡ , 2022 .

[29]  Jie Li,et al.  Early observations on the performance of Windows Azure , 2010, HPDC '10.

[30]  Bhushan Nemade,et al.  Cloud computing: Windows Azure platform , 2011, ICWET.

[31]  Natalio Krasnogor,et al.  Protein Models Comparator: Scalable Bioinformatics Computing on the Google App Engine Platform , 2011, 1102.4293.

[32]  P. Mell,et al.  SP 800-145. The NIST Definition of Cloud Computing , 2011 .

[33]  Eero Vainikko,et al.  Adapting scientific computing problems to clouds using MapReduce , 2012, Future Gener. Comput. Syst..

[34]  Erol Gelenbe,et al.  Energy-Efficient Cloud Computing , 2010, Comput. J..

[35]  P. Mell,et al.  The NIST Definition of Cloud Computing , 2011 .

[36]  Alexandru Iosup,et al.  A Performance Analysis of EC2 Cloud Computing Services for Scientific Computing , 2009, CloudComp.

[37]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.