A Research Roadmap of Big Data Clustering Algorithms for Future Internet of Things

Due to the massive data increase in different Internet of Things (IoT) domains such as healthcare IoT and Smart City IoT, Big Data technologies have been emerged as critical analytics tools for analyzing the IoT data. Among the Big Data technologies, data clustering is one of the essential approaches to process the IoT data. However, how to select a suitable clustering algorithm for IoT data is still unclear. Furthermore, since Big Data technology are still in its initial stage for different IoT domains, it is thus valuable to propose and structure the research challenges between Big Data and IoT. Therefore, this paper starts from reviewing and comparing the data clustering algorithms that can be applied in IoT datasets, and then extends the discussions to a broader IoT context such as IoT dynamics and IoT mobile networks. Finally, this paper identifies a set of research challenges that harvest a research roadmap for the Big Data research in IoT domains. The proposed research roadmap aims at bridging the research gaps between Big Data and various IoT contexts.

[1]  Mubashir Husain Rehmani,et al.  Mobile Edge Computing: Opportunities, solutions, and challenges , 2017, Future Gener. Comput. Syst..

[2]  Van-Hau Pham,et al.  Parallel Two-Phase K-Means , 2013, ICCSA.

[3]  Shivani Goel,et al.  A comprehensive study on clustering approaches for big data mining , 2015, 2015 2nd International Conference on Electronics and Communication Systems (ICECS).

[4]  Victor I. Chang,et al.  Distributed behavior model orchestration in cognitive internet of things solution , 2016, Enterp. Inf. Syst..

[5]  Jiawei Han,et al.  CLARANS: A Method for Clustering Objects for Spatial Data Mining , 2002, IEEE Trans. Knowl. Data Eng..

[6]  Hong Liu,et al.  A grouping method based on grid density and relationship for crowd evacuation simulation , 2017 .

[7]  S. Viswanadha Raju,et al.  Review of Clustering Techniques , 2017 .

[8]  Mouzhi Ge,et al.  Big Data for Internet of Things: A Survey , 2018, Future Gener. Comput. Syst..

[9]  Eero Vainikko,et al.  Adapting scientific computing problems to clouds using MapReduce , 2012, Future Gener. Comput. Syst..

[10]  Ying Wah Teh,et al.  Time-series clustering - A decade review , 2015, Inf. Syst..

[11]  Wu He,et al.  Internet of Things in Industries: A Survey , 2014, IEEE Transactions on Industrial Informatics.

[12]  Tian Zhang,et al.  BIRCH: an efficient data clustering method for very large databases , 1996, SIGMOD '96.

[13]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[14]  George Mastorakis,et al.  Internet of Things (IoT) in 5G Mobile Technologies , 2016 .

[15]  Victor I. Chang,et al.  Towards fog-driven IoT eHealth: Promises and challenges of IoT in medicine and healthcare , 2018, Future Gener. Comput. Syst..

[16]  Mouzhi Ge,et al.  Exploring Big Data Clustering Algorithms for Internet of Things Applications , 2018, IoTBDS.

[17]  Ying Wah Teh,et al.  On Density-Based Data Streams Clustering Algorithms: A Survey , 2014, Journal of Computer Science and Technology.

[18]  Chin-Laung Lei,et al.  SFaaS: Keeping an eye on IoT fusion environment with security fusion as a service , 2018, Future Gener. Comput. Syst..

[19]  Jian Ma,et al.  A network-assisted co-clustering algorithm to discover cancer subtypes based on gene expression , 2014, BMC Bioinformatics.

[20]  Marimuthu Palaniswami,et al.  Scalable single linkage hierarchical clustering for big data , 2013, 2013 IEEE Eighth International Conference on Intelligent Sensors, Sensor Networks and Information Processing.

[21]  Ying Li,et al.  Impact of Next-Generation Mobile Technologies on IoT-Cloud Convergence , 2017, IEEE Commun. Mag..

[22]  Qishan Zhang,et al.  Community discovery by propagating local and global information based on the MapReduce model , 2015, Inf. Sci..

[23]  M. Murphy,et al.  What Is Machine Learning , 2015 .

[24]  Ying Wah Teh,et al.  Big data reduction framework for value creation in sustainable enterprises , 2016, Int. J. Inf. Manag..

[25]  Yizong Cheng,et al.  Mean Shift, Mode Seeking, and Clustering , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[26]  Sanjit Kumar Dash,et al.  Privacy preserving K-Medoids clustering: an approach towards securing data in Mobile cloud architecture , 2012, CCSEIT '12.

[27]  Dimitrios Gunopulos,et al.  Automatic subspace clustering of high dimensional data for data mining applications , 1998, SIGMOD '98.

[28]  Ulrike von Luxburg,et al.  A tutorial on spectral clustering , 2007, Stat. Comput..

[29]  Pierpaolo D'Urso,et al.  Exponential distance-based fuzzy clustering for interval-valued data , 2017, Fuzzy Optim. Decis. Mak..

[30]  Guangyi Liu,et al.  An Overview of 5G Requirements , 2017 .

[31]  Markus Helfert,et al.  A Comparison of Smart City Development and Big Data Analytics Adoption Approaches , 2018, SMARTGREENS.

[32]  Gang Sun,et al.  The framework and algorithm for preserving user trajectory while using location-based services in IoT-cloud systems , 2017, Cluster Computing.

[33]  Ibrahim Aljarah,et al.  Parallel glowworm swarm optimization clustering algorithm based on MapReduce , 2014, 2014 IEEE Symposium on Swarm Intelligence.

[34]  Yan Yang,et al.  A Parallel Cop-Kmeans Clustering Algorithm Based on MapReduce Framework , 2011 .

[35]  Jiong Yang,et al.  STING: A Statistical Information Grid Approach to Spatial Data Mining , 1997, VLDB.

[36]  J. A. Hartigan,et al.  A k-means clustering algorithm , 1979 .

[37]  Klaus Moessner,et al.  Context-aware stream processing for distributed IoT applications , 2015, 2015 IEEE 2nd World Forum on Internet of Things (WF-IoT).

[38]  Anind K. Dey,et al.  Understanding and Using Context , 2001, Personal and Ubiquitous Computing.

[39]  Eleni Constantinou,et al.  Landmark selection for spectral clustering based on Weighted PageRank , 2017, Future Gener. Comput. Syst..

[40]  D. Sculley,et al.  Web-scale k-means clustering , 2010, WWW '10.

[41]  Markus Helfert,et al.  Taxonomy of Smart Elements for Designing Effective Services , 2017, AMCIS.

[42]  Pavel Berkhin,et al.  A Survey of Clustering Data Mining Techniques , 2006, Grouping Multidimensional Data.

[43]  Bin Liu,et al.  A Dependable Time Series Analytic Framework for Cyber-Physical Systems of IoT-based Smart Grid , 2018, ACM Trans. Cyber Phys. Syst..

[44]  Yang Lu,et al.  Big data analytics and big data science: a survey , 2016 .

[45]  Zahir Tari,et al.  A Survey of Clustering Algorithms for Big Data: Taxonomy and Empirical Analysis , 2014, IEEE Transactions on Emerging Topics in Computing.

[46]  Peter J. Rousseeuw,et al.  Clustering by means of medoids , 1987 .

[47]  Muthu Ramachandran,et al.  Efficient location privacy algorithm for Internet of Things (IoT) services and applications , 2017, J. Netw. Comput. Appl..

[48]  Poonam Goyal,et al.  A Fast, Scalable SLINK Algorithm for Commodity Cluster Computing Exploiting Spatial Locality , 2016, 2016 IEEE 18th International Conference on High Performance Computing and Communications; IEEE 14th International Conference on Smart City; IEEE 2nd International Conference on Data Science and Systems (HPCC/SmartCity/DSS).

[49]  Tarik Taleb,et al.  On Enabling 5G Automotive Systems Using Follow Me Edge-Cloud Concept , 2018, IEEE Transactions on Vehicular Technology.

[50]  Rem W. Collier,et al.  A Survey of Clustering Techniques in WSNs and Consideration of the Challenges of Applying Such to 5G IoT Scenarios , 2017, IEEE Internet of Things Journal.

[51]  Brendan J. Frey,et al.  Hierarchical Affinity Propagation , 2011, UAI.

[52]  Victor I. Chang,et al.  Emerging services for Internet of Things , 2017, J. Netw. Comput. Appl..

[53]  Ludovic Noirie,et al.  A Scalable IoT Service Search Based on Clustering and Aggregation , 2013, 2013 IEEE International Conference on Green Computing and Communications and IEEE Internet of Things and IEEE Cyber, Physical and Social Computing.

[54]  G. Krishna,et al.  Agglomerative clustering using the concept of mutual nearest neighbourhood , 1978, Pattern Recognit..

[55]  Victor I. Chang,et al.  Privacy-preserving fusion of IoT and big data for e-health , 2018, Future Gener. Comput. Syst..

[56]  Ying Wah Teh,et al.  Iterative big data clustering algorithms: a review , 2016, Softw. Pract. Exp..

[57]  Gunasekaran Manogaran,et al.  Big Data Knowledge System in Healthcare , 2017 .