Hadoop Image Processing Framework

With the rapid growth of social media, the number of images being uploaded to the internet is exploding. Massive quantities of images are shared through multi-platform services such as Snap chat, Instagram, Facebook and Whats App, recent studies estimate that over 1.8 billion photos are uploaded every day. However, for the most part, applications that make use of this vast data have yet to emerge. Most current image processing applications, designed for small-scale, local computation, do not scale well to web-sized problems with their large requirements for computational resources and storage. The emergence of processing frameworks such as the Hadoop MapReduce platform addresses the problem of providing a system for computationally intensive data processing and distributed storage. However, to learn the technical complexities of developing useful applications using Hadoop requires a large investment of time and experience on the part of the developer. As such, the pool of researchers and programmers with the varied skills to develop applications that can use large sets of images has been limited. To address this we have developed the Hadoop Image Processing Framework, which provides a Hadoop-based library to support large-scale image processing. The main aim of the framework is to allow developers of image processing applications to leverage the Hadoop MapReduce framework without having to master its technical details and introduce an additional source of complexity and error into their programs.

[1]  J. Canny A Computational Approach to Edge Detection , 1986, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Laurent Amsaleg,et al.  Terabyte-scale image similarity search: Experience and best practice , 2013, 2013 IEEE International Conference on Big Data.

[3]  Daniel P. Huttenlocher,et al.  Landmark classification in large-scale image collections , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[4]  Bo Li,et al.  Parallel K-Means Clustering of Remote Sensing Images Based on MapReduce , 2010, WISM.

[5]  Ryan A. Rossi,et al.  A Scalable Image Processing Framework for gigapixel Mars and other celestial body images , 2010, 2010 IEEE Aerospace Conference.

[6]  Hairong Kuang,et al.  The Hadoop Distributed File System , 2010, 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST).

[7]  Jimmy J. Lin,et al.  Web-scale computer vision using MapReduce for multimedia data mining , 2010, MDMKDD '10.

[8]  M. DePristo,et al.  The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. , 2010, Genome research.

[9]  Jimmy J. Lin,et al.  Book Reviews: Data-Intensive Text Processing with MapReduce by Jimmy Lin and Chris Dyer , 2010, CL.

[10]  Karin K. Breitman,et al.  An Architecture for Distributed High Performance Video Processing in the Cloud , 2010, 2010 IEEE 3rd International Conference on Cloud Computing.

[11]  Hans De Sterck,et al.  Case Study of Scientific Data Processing on a Cloud Using Hadoop , 2009, HPCS.

[12]  Yinhai Wang,et al.  Ultra-fast processing of gigapixel Tissue MicroArray images using High Performance Computing , 2011, Cellular Oncology.

[13]  Magdalena Balazinska,et al.  Astronomy in the Cloud: Using MapReduce for Image Co-Addition , 2010, ArXiv.

[14]  Phuong Nguyen,et al.  Terabyte-sized image computations on Hadoop cluster platforms , 2013, 2013 IEEE International Conference on Big Data.

[15]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.