Application of Hadoop MapReduce technique to Virtual Database system design

Today in the world of cloud and grid computing integration of data from heterogeneous databases is inevitable. Virtual Database Technology (VDB) is one of the effective solutions for integration of data from heterogeneous sources. This will become complex when size of the database is very large. MapReduce is a new framework specifically designed for processing huge datasets on distributed sources. Apache's Hadoop is an implementation of MapReduce. Currently Hadoop has been applied successfully for file based datasets. This paper proposes to utilize the parallel and distributed processing capability of Hadoop MapReduce for handling heterogeneous query execution on large datasets. So Virtual Database Engine built on top of this will result in effective high performance distributed data integration.

[1]  Gang Chen,et al.  Optimization of sub-query processing in distributed data integration systems , 2011, J. Netw. Comput. Appl..

[2]  Tom White,et al.  Hadoop: The Definitive Guide , 2009 .

[3]  Ralf Lämmel,et al.  Google's MapReduce programming model - Revisited , 2007, Sci. Comput. Program..

[4]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[5]  Maozhen Li,et al.  MRSim: A discrete event based MapReduce simulator , 2010, 2010 Seventh International Conference on Fuzzy Systems and Knowledge Discovery.

[6]  Joann J. Ordille,et al.  Querying Heterogeneous Information Sources Using Source Descriptions , 1996, VLDB.

[7]  M. Pantoquilho,et al.  XML based Metadata Repository for Information Systems , 2005, 2005 portuguese conference on artificial intelligence.

[8]  Jun Sawamoto,et al.  Virtual Database Technology for Distributed Database , 2010, 2010 IEEE 24th International Conference on Advanced Information Networking and Applications Workshops.

[9]  Jing Li,et al.  VDM: Virtual Database Management for Distributed Databases and File Systems , 2008, 2008 Seventh International Conference on Grid and Cooperative Computing.

[10]  Anand Rajaraman,et al.  Virtual database technology , 1997, SGMD.

[11]  Ji-Hoon Kang,et al.  Optimization of XQuery Queries Including FOR Clauses , 2007, Second International Conference on Internet and Web Applications and Services (ICIW'07).