Applying Hadoop's MapReduce framework on clustering the GPS signals through cloud computing

Year by year, we are considerably witnessing a dramatic increase in the size of data gathered from machines or human interactions. Typically, the data generated by machines is massive, complex and comes from different varieties including sensors collecting climate information, posts being shared in social media sites, videos being posted online, digital pictures, transaction records of online purchases, cell phone GPS signals and so on. Not surprisingly, the amount of data generated by machines is greater than the data generated by human elements. Sensor data (obtained from transportation, logistics, retail, utilities, and telecommunications) has continuously been generated from fleet GPS trans-receivers, RFID tag readers; smart meters, to cell phones. Such data has frequently been used in numerous parallel processing methods so as to optimize operations and drive operational business intelligence (BI) systems scrutinizing immediate business opportunities. Appropriately, MapReduce is a programming model designed for expressing distributed computations on massive datasets and an execution framework for large-scale data processing on clusters of commodity servers. In this paper, we enhanced the Hadoop MapReduce for data-intensive computing on massive datasets of GPS signals. We developed an execution framework for large-scale data processing through the cloud system - in order to reduce the execution time of the cluster systems - as well.

[1]  Borko Furht,et al.  Handbook of Cloud Computing , 2010 .

[2]  Douglas Stott Parker,et al.  Map-reduce-merge: simplified relational data processing on large clusters , 2007, SIGMOD '07.

[3]  Anthony K. H. Tung,et al.  MAP-JOIN-REDUCE: Toward Scalable and Efficient Data Analysis on Large Clusters , 2011, IEEE Transactions on Knowledge and Data Engineering.

[4]  Dan Meng,et al.  Transformer: A New Paradigm for Building Data-Parallel Programming Models , 2010, IEEE Micro.

[5]  Wichian Premchaiswadi,et al.  Optimizing and Tuning MapReduce Jobs to Improve the Large‐Scale Data Analysis Process , 2013, Int. J. Intell. Syst..

[6]  W. Romsaiyud,et al.  Extracting weblog of Siam University for learning user behavior on MapReduce , 2012, 2012 4th International Conference on Intelligent and Advanced Systems (ICIAS2012).

[7]  Craig MacDonald,et al.  MapReduce indexing strategies: Studying scalability and efficiency , 2012, Inf. Process. Manag..

[8]  Arindam Banerjee,et al.  Clickstream clustering using weighted longest common subsequences , 2001 .

[9]  Qiang Gao,et al.  A Switch Criterion for Hybrid Datasets Merging on Top of Map Reduce , 2009, 2009 Eighth International Conference on Grid and Cooperative Computing.

[10]  Jie Liu,et al.  Data Mixed-Extraction Strategy Based on the Time Characteristics in CDW , 2010, 2010 First International Conference on Pervasive Computing, Signal Processing and Applications.

[11]  Michael Miller,et al.  Cloud Computing: Web-Based Applications That Change the Way You Work and Collaborate Online , 2008 .