IBRIDIA: A hybrid solution for processing big logistics data

Abstract Internet of Things (IoT) is leading to a paradigm shift within the logistics industry. Logistics services providers use sensor technologies such as GPS or telemetry to track and manage their shipment processes. Additionally, they use external data that contain critical information about events such as traffic, accidents, and natural disasters. Correlating data from different sensors and social media and performing analysis in real-time provide opportunities to predict events and prevent unexpected delivery delay at run-time. However, collecting and processing data from heterogeneous sources foster problems due to the variety and velocity of data. In addition, processing data in real-time is heavily challenging that it cannot be dealt with using conventional logistics information systems. In this paper, we present a hybrid framework for processing massive volume of data in batch style and real-time. Our framework is built upon Johnson’s hierarchical clustering (HCL) algorithm which produces a dendrogram that represents different clusters of data objects.

[1]  Lawrence O. Hall,et al.  Fast fuzzy clustering , 1998, Fuzzy Sets Syst..

[2]  Philip S. Yu,et al.  On High Dimensional Projected Clustering of Data Streams , 2005, Data Mining and Knowledge Discovery.

[3]  Rajeev Motwani,et al.  Maintaining variance and k-medians over data stream windows , 2003, PODS.

[4]  Sudipto Guha,et al.  Clustering Data Streams: Theory and Practice , 2003, IEEE Trans. Knowl. Data Eng..

[5]  Geoff Holmes,et al.  MOA: Massive Online Analysis , 2010, J. Mach. Learn. Res..

[6]  David Ratcliffe,et al.  Finding Fires with Twitter , 2013, ALTA.

[7]  Veda C. Storey,et al.  Business Intelligence and Analytics: From Big Data to Big Impact , 2012, MIS Q..

[8]  Divesh Srivastava,et al.  Big data integration , 2013, 2013 IEEE 29th International Conference on Data Engineering (ICDE).

[9]  Sudipto Guha,et al.  Clustering data streams , 2000, Proceedings 41st Annual Symposium on Foundations of Computer Science.

[10]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[11]  Shouyang Wang,et al.  Information and decision-making delays in MRP, KANBAN, and CONWIP , 2014 .

[12]  James C. Bezdek,et al.  Efficient Implementation of the Fuzzy c-Means Clustering Algorithms , 1986, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Andreas Nettstraeter,et al.  The Internet of Things in Logistics , 2010 .

[14]  Kitsana Waiyamai,et al.  E-Stream: Evolution-Based Technique for Stream Clustering , 2007, ADMA.

[15]  Madjid Khalilian,et al.  K-Means Divide and Conquer Clustering , 2009, 2009 International Conference on Computer and Automation Engineering.

[16]  Ameet Talwalkar,et al.  MLlib: Machine Learning in Apache Spark , 2015, J. Mach. Learn. Res..

[17]  Tian Zhang,et al.  BIRCH: an efficient data clustering method for very large databases , 1996, SIGMOD '96.

[18]  Khalil Drira,et al.  A Semantic Big Data Platform for Integrating Heterogeneous Wearable Data in Healthcare , 2015, Journal of Medical Systems.

[19]  James C. Bezdek,et al.  Extending fuzzy and probabilistic clustering to very large data sets , 2006, Comput. Stat. Data Anal..

[20]  Vladimir Estivill-Castro,et al.  Why so many clustering algorithms: a position paper , 2002, SKDD.

[21]  S. Fawcett,et al.  Data Science, Predictive Analytics, and Big Data: A Revolution that Will Transform Supply Chain Design and Management , 2013 .

[22]  Antonio Pescapè,et al.  Integration of Cloud computing and Internet of Things: A survey , 2016, Future Gener. Comput. Syst..

[23]  Tal Galili,et al.  dendextend: an R package for visualizing, adjusting and comparing trees of hierarchical clustering , 2015, Bioinform..

[24]  Boudewijn F. van Dongen,et al.  The ProM Framework: A New Era in Process Mining Tool Support , 2005, ICATPN.

[25]  Antonio Gomariz,et al.  SPMF: a Java open-source pattern mining library , 2014, J. Mach. Learn. Res..

[26]  Shi-Jinn Horng,et al.  Designing scalable and efficient parallel clustering algorithms on arrays with reconfigurable optical buses , 2000, Image Vis. Comput..

[27]  M. Kumar,et al.  Performance Comparison of Two Streaming Data Clustering Algorithms , 2014, ArXiv.

[28]  David S. Cochran,et al.  Big data analytics with applications , 2014 .

[29]  Philip S. Yu,et al.  A Framework for Clustering Evolving Data Streams , 2003, VLDB.

[30]  Li Tu,et al.  Density-based clustering for real-time stream data , 2007, KDD '07.

[31]  Gabriel Nieva Integrating Heterogeneous Data , 2016 .

[32]  Ronald L. Graham,et al.  On the History of the Minimum Spanning Tree Problem , 1985, Annals of the History of Computing.

[33]  Yehia Taher,et al.  ProLoD: An Efficient Framework for Processing Logistics Data , 2017, OTM Conferences.

[34]  John R. Anderson,et al.  MACHINE LEARNING An Artificial Intelligence Approach , 2009 .

[35]  Io Taxidou,et al.  Realtime Analysis of Information Diffusion in Social Media , 2013, Proc. VLDB Endow..