Towards a framework for large-scale multimedia data storage and processing on Hadoop platform

Cloud computing techniques take the form of distributed computing by utilizing multiple computers to execute computing simultaneously on the service side. To process the increasing quantity of multimedia data, numerous large-scale multimedia data storage computing techniques in the cloud computing have been developed. Of all the techniques, Hadoop plays a key role in the cloud computing. Hadoop, a computing cluster formed by low-priced hardware, can conduct the parallel computing of petabytes of multimedia data. Hadoop features high-reliability, high-efficiency, and high-scalability. The numerous large-scale multimedia data computing techniques include not only the key core techniques, Hadoop and MapReduce, but also the data collection techniques, such as File Transfer Protocol and Flume. In addition, distributed system configuration allocation, automatic installation, and monitoring platform building and management techniques are all included. As a result, only with the integration of all the techniques, a reliable large-scale multimedia data platform can be offered. In this paper, we introduce how cloud computing can make a breakthrough by proposing a multimedia social network dataset on Hadoop platform and implementing a prototype version. Detailed specifications and design issues are discussed as well. An important finding of this article is that we can save more time if we conduct the multimedia social networking analysis using Cloud Hadoop Platform rather than using a single computer. The advantages of cloud computing over the traditional data processing practices are fully demonstrated in this article. The applicable framework designs and the tools available for the large-scale data processing are also proposed. We show the experimental multimedia data including data sizes and processing time.

[1]  Todd Eavis,et al.  Towards a Scalable, Performance-Oriented OLAP Storage Engine , 2012, DASFAA.

[2]  Michael Stonebraker,et al.  MapReduce and parallel DBMSs: friends or foes? , 2010, CACM.

[3]  Tin Yu Wu,et al.  Cloud-based image processing system with priority-based data distribution mechanism , 2012, Comput. Commun..

[4]  Robert L. Grossman,et al.  Malstone: towards a benchmark for analytics on large data clouds , 2010, KDD '10.

[5]  Erik Thomsen,et al.  OLAP Solutions - Building Multidimensional Information Systems , 1997 .

[6]  Jingren Zhou,et al.  SCOPE: easy and efficient parallel processing of massive data sets , 2008, Proc. VLDB Endow..

[7]  Shivnath Babu,et al.  Towards automatic optimization of MapReduce programs , 2010, SoCC '10.

[8]  Tin Yu Wu,et al.  A novel 3D streaming protocol supported multi-mode display , 2011, J. Netw. Comput. Appl..

[9]  Wil M.P. van der Aalst Process Mining: Overview and Opportunities , 2012, TMIS.

[10]  GhemawatSanjay,et al.  The Google file system , 2003 .

[11]  Laks V. S. Lakshmanan,et al.  QC-trees: an efficient summary structure for semantic OLAP , 2003, SIGMOD '03.

[12]  Wilson C. Hsieh,et al.  Bigtable: A Distributed Storage System for Structured Data , 2006, TOCS.

[13]  Jared Flatow,et al.  Disco: a computing platform for large-scale data analytics , 2011, Erlang '11.

[14]  Nandan Mirajkar,et al.  Implementation of Private Cloud using Eucalyptus and an open source Operating System , 2012, ArXiv.

[15]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[16]  Ronald C. Taylor An overview of the Hadoop/MapReduce/HBase framework and its current applications in bioinformatics , 2010, BMC Bioinformatics.

[17]  Rajiv Ramnath,et al.  Towards building large-scale distributed systems for twitter sentiment analysis , 2012, SAC '12.