Design and Implement a MapReduce Framework for Executing Standalone Software Packages in Hadoop-based Distributed Environments

The Hadoop MapReduce is the programming model of designing the auto scalable distributed computing applications. It provides developer an effective environment to attain automatic parallelization. However, most existing manufacturing systems are arduous and restrictive to migrate to MapReduce private cloud, due to the platform incompatible and tremendous complexity of system reconstruction. For increasing the efficiency of manufacturing systems with minimum modification of existing systems, we design a framework in this thesis, called MC-Framework: Multi-uses-based Cloudizing-Application Framework. It provides the simple interface to users for fairly executing requested tasks worked with traditional standalone software packages in MapReduce-based private cloud environments. Moreover, this thesis focuses on the multiuser workloads, but the default Hadoop scheduling scheme, i.e., FIFO, would increase delay under multiuser scenarios. Hence, we also propose a new scheduling mechanism, called Job-Sharing Scheduling, to explore and fairly share the jobs to machines in the MapReduce-based private cloud. Then, we prototype an experimental virtual-metrology module of a manufacturing system as a case study to verify and analysis the proposed MC-Framework.  The results of our experiments indicate that our proposed framework enormously improved the time performance compared with the original package.

[1]  Yuan Kang,et al.  Virtual Metrology Technique for Semiconductor Manufacturing , 2006, The 2006 IEEE International Joint Conference on Neural Network Proceedings.

[2]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[3]  Randy H. Katz,et al.  Improving MapReduce Performance in Heterogeneous Environments , 2008, OSDI.

[4]  Min-Hsiung Hung,et al.  Development of an AVM System Implementation Framework , 2012, IEEE Transactions on Semiconductor Manufacturing.

[5]  Raouf Boutaba,et al.  Cloud computing: state-of-the-art and research challenges , 2010, Journal of Internet Services and Applications.

[6]  Hairong Kuang,et al.  The Hadoop Distributed File System , 2010, 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST).

[7]  Ying Li,et al.  Performance under Failures of MapReduce Applications , 2011, 2011 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing.

[8]  Jeremy T. Bradley,et al.  Distributed Response Time Analysis of GSPN Models with MapReduce , 2008, 2008 International Symposium on Performance Evaluation of Computer and Telecommunication Systems.

[9]  Sanjay Ghemawat,et al.  MapReduce: simplified data processing on large clusters , 2008, CACM.

[10]  Chen Hao,et al.  Research of Cloud Computing Based on the Hadoop Platform , 2011, 2011 International Conference on Computational and Information Sciences.

[11]  Fan-Tien Cheng,et al.  Application development of virtual metrology in semiconductor industry , 2005, 31st Annual Conference of IEEE Industrial Electronics Society, 2005. IECON 2005..

[12]  Tom White,et al.  Hadoop: The Definitive Guide , 2009 .

[13]  Ralf Lämmel,et al.  Google's MapReduce programming model - Revisited , 2007, Sci. Comput. Program..

[14]  Yi Pan,et al.  M2M: A simple Matlab-to-MapReduce translator for cloud computing , 2013 .