Big IoT Data Analytics: Architecture, Opportunities, and Open Research Challenges

Voluminous amounts of data have been produced, since the past decade as the miniaturization of Internet of things (IoT) devices increases. However, such data are not useful without analytic power. Numerous big data, IoT, and analytics solutions have enabled people to obtain valuable insight into large data generated by IoT devices. However, these solutions are still in their infancy, and the domain lacks a comprehensive survey. This paper investigates the state-of-the-art research efforts directed toward big IoT data analytics. The relationship between big data analytics and IoT is explained. Moreover, this paper adds value by proposing a new architecture for big IoT data analytics. Furthermore, big IoT data analytic types, methods, and technologies for big data mining are discussed. Numerous notable use cases are also presented. Several opportunities brought by data analytics in IoT paradigm are then discussed. Finally, open research challenges, such as privacy, big data mining, visualization, and integration, are presented as future research directions.

[1]  Abdullah Gani,et al.  A survey on indexing techniques for big data: taxonomy and performance evaluation , 2016, Knowledge and Information Systems.

[2]  Mathias Uslar,et al.  A standards-based approach for domain specific modelling of smart grid system architectures , 2016, 2016 11th System of Systems Engineering Conference (SoSE).

[3]  Evgeniy Yur'evich Gorodov,et al.  Analytical Review of Data Visualization Methods in Application to Big Data , 2013, J. Electr. Comput. Eng..

[4]  Sudip Misra,et al.  Cloud Computing Applications for Smart Grid: A Survey , 2015, IEEE Transactions on Parallel and Distributed Systems.

[5]  Rajkumar Buyya,et al.  Big Data Computing and Clouds: Challenges, Solutions, and Future Directions , 2013, ArXiv.

[6]  Xiaojiang Chen,et al.  A QoS Architecture for IOT , 2011, 2011 International Conference on Internet of Things and 4th International Conference on Cyber, Physical and Social Computing.

[7]  J. Sherly,et al.  INTERNET OF THINGS BASED SMART TRANSPORTATION SYSTEMS , 2015 .

[8]  Konstantinos Kalpakis,et al.  Distance measures for effective clustering of ARIMA time-series , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[9]  Aboul Ella Hassanien,et al.  Dimensionality reduction of medical big data using neural-fuzzy classifier , 2014, Soft Computing.

[10]  J. Coughlin,et al.  Using Big Data Technologies and Analytics to Predict Sensor Anomalies , 2015 .

[11]  Jay Kreps,et al.  Kafka : a Distributed Messaging System for Log Processing , 2011 .

[12]  Peter E. Thornton,et al.  Big data visual analytics for exploratory earth system simulation analysis , 2013, Comput. Geosci..

[13]  C. L. Philip Chen,et al.  Data-intensive applications, challenges, techniques and technologies: A survey on Big Data , 2014, Inf. Sci..

[14]  Xindong Wu,et al.  Data mining with big data , 2014, IEEE Transactions on Knowledge and Data Engineering.

[15]  Florian Waas Beyond Conventional Data Warehousing - Massively Parallel Data Processing with Greenplum Database - (Invited Talk) , 2008, BIRTE.

[16]  Hyesoon Kim,et al.  An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness , 2009, ISCA '09.

[17]  Geoff Holmes,et al.  MOA: Massive Online Analysis , 2010, J. Mach. Learn. Res..

[18]  Nitin Kumar,et al.  Time-series Bitmaps: a Practical Visualization Tool for Working with Large Time Series Databases , 2005, SDM.

[19]  Soon Myoung Chung,et al.  Efficient Mining of Maximal Sequential Patterns Using Multiple Samples , 2005, SDM.

[20]  Dennis Shasha,et al.  High Performance Discovery In Time Series: Techniques And Case Studies (Monographs in Computer Science) , 2004 .

[21]  Athanasios V. Vasilakos,et al.  Data Mining for the Internet of Things: Literature Review and Challenges , 2015, Int. J. Distributed Sens. Networks.

[22]  Puqiang Zhang,et al.  Data-driven method based on particle swarm optimization and k-nearest neighbor regression for estimating capacity of lithium-ion battery , 2014 .

[23]  Lu Huang,et al.  A survey of mass data mining based on cloud-computing , 2012, Anti-counterfeiting, Security, and Identification.

[24]  Xindong Wu,et al.  A logical framework for identifying quality knowledge from different data sources , 2006, Decis. Support Syst..

[25]  M. Pfaffl,et al.  A new mathematical model for relative quantification in real-time RT-PCR. , 2001, Nucleic acids research.

[26]  Pavel Berkhin,et al.  A Survey of Clustering Data Mining Techniques , 2006, Grouping Multidimensional Data.

[27]  I. A. Hashem,et al.  A survey of big data management : Taxonomy and state-ofthe-art , 2016 .

[28]  Xuedong Liang,et al.  A Taxonomy of Agent Technologies for Ubiquitous Computing Environments , 2012, KSII Trans. Internet Inf. Syst..

[29]  Cheryl Ann Alexander,et al.  Big Data and Visualization: Methods, Challenges and Technology Progress , 2015 .

[30]  Dilpreet Singh,et al.  A survey on platforms for big data analytics , 2014, Journal of Big Data.

[31]  Ujjwal Maulik,et al.  A Survey of Multiobjective Evolutionary Algorithms for Data Mining: Part I , 2014, IEEE Transactions on Evolutionary Computation.

[32]  Navneet Golchha Big Data – The information revolution , 2015 .

[33]  John Gantz,et al.  The Digital Universe in 2020: Big Data, Bigger Digital Shadows, and Biggest Growth in the Far East , 2012 .

[34]  Miguel Castro,et al.  Scribe: a large-scale and decentralized application-level multicast infrastructure , 2002, IEEE J. Sel. Areas Commun..

[35]  T. E. Marshall,et al.  Business intelligence: an analysis of the literature , 2008 .

[36]  Bongsik Shin,et al.  Data quality management, data usage experience and acquisition intention of big data analytics , 2014, Int. J. Inf. Manag..

[37]  C H Ganz,et al.  What you need to know about the Internet. , 1996, Dentistry today.

[38]  KimHyesoon,et al.  An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness , 2009 .

[39]  Guoqiang Wang,et al.  ICN based Architecture for IoT , 2014 .

[40]  Fusheng Yang,et al.  Mu rhythm-based cursor control: an offline analysis , 2004, Clinical Neurophysiology.

[41]  GaniAbdullah,et al.  The rise of "big data" on cloud computing , 2015 .

[42]  Ashiq Anjum,et al.  Cloud Based Big Data Analytics for Smart Future Cities , 2013, UCC.

[43]  Oscar Castillo,et al.  A review on type-2 fuzzy logic applications in clustering, classification and pattern recognition , 2014, Appl. Soft Comput..

[44]  Philip S. Yu,et al.  Mining Knowledge from Interconnected Data: A Heterogeneous Information Network Analysis Approach , 2012, Proc. VLDB Endow..

[45]  N. B. Anuar,et al.  The rise of "big data" on cloud computing: Review and open research issues , 2015, Inf. Syst..

[46]  Alexandre d'Aspremont,et al.  Predicting abnormal returns from news using text classification , 2008, 0809.2792.

[47]  Melnned M. Kantardzic Big Data Analytics , 2013, Lecture Notes in Computer Science.

[48]  David G. Lowe,et al.  Scalable Nearest Neighbor Algorithms for High Dimensional Data , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[49]  Randy H. Katz,et al.  Chukwa: A System for Reliable Large-Scale Log Collection , 2010, LISA.

[50]  Mina Akaishi,et al.  Time-tunnel: visual analysis tool for time-series numerical data and its extension toward parallel coordinates , 2005, International Conference on Computer Graphics, Imaging and Visualization (CGIV'05).

[51]  Sarah C. Darby,et al.  Smart metering: what potential for householder engagement? , 2010 .

[52]  Daniel T. Larose,et al.  k‐Nearest Neighbor Algorithm , 2005 .

[53]  Jian Liu,et al.  Data integration in fuzzy XML documents , 2014, Inf. Sci..

[54]  Avi Ma'ayan,et al.  Lean Big Data integration in systems biology and systems pharmacology. , 2014, Trends in pharmacological sciences.

[55]  Marimuthu Palaniswami,et al.  Internet of Things (IoT): A vision, architectural elements, and future directions , 2012, Future Gener. Comput. Syst..

[56]  Donald K. Wedding,et al.  Discovering Knowledge in Data, an Introduction to Data Mining , 2005, Inf. Process. Manag..

[57]  Noureddine Zerhouni,et al.  Bearing Health Monitoring Based on Hilbert–Huang Transform, Support Vector Machine, and Regression , 2015, IEEE Transactions on Instrumentation and Measurement.

[58]  Pasquale Pagano,et al.  Managing Big Data through Hybrid Data Infrastructures , 2012, ERCIM News.

[59]  Hamid Sharif,et al.  A Survey on Smart Grid Communication Infrastructures: Motivations, Requirements and Challenges , 2013, IEEE Communications Surveys & Tutorials.

[60]  Antonio Iera,et al.  The Internet of Things: A survey , 2010, Comput. Networks.

[61]  Han-Chuan Hsieh,et al.  Internet of Things Architecture Based on Integrated PLC and 3G Communication Networks , 2011, 2011 IEEE 17th International Conference on Parallel and Distributed Systems.

[62]  GandomiAmir,et al.  Beyond the hype , 2015 .

[63]  William J. Schroeder,et al.  Research Challenges for Visualization Software , 2012, Computer.

[64]  J. Manyika Big data: The next frontier for innovation, competition, and productivity , 2011 .

[65]  Athanasios V. Vasilakos,et al.  Big data analytics: a survey , 2015, Journal of Big Data.

[66]  Myung Ho Kim,et al.  Data protection in the industrial internet of things , 2016, IoT 2016.

[67]  Albert Bifet,et al.  Massive Online Analysis , 2009 .

[68]  Chen Li,et al.  Inside "Big Data management": ogres, onions, or parfaits? , 2012, EDBT '12.

[69]  Murtaza Haider,et al.  Beyond the hype: Big data concepts, methods, and analytics , 2015, Int. J. Inf. Manag..

[70]  Wolfgang Renz,et al.  Energy Service Description for Capabilities of Distributed Energy Resources , 2015, D-A-CH EI.

[71]  Jeffrey S. Norris,et al.  Immersive and collaborative data visualization using virtual reality platforms , 2014, 2014 IEEE International Conference on Big Data (Big Data).

[72]  Anjana Gosain,et al.  A comprehensive survey of association rules on quantitative data in data mining , 2013, 2013 IEEE CONFERENCE ON INFORMATION AND COMMUNICATION TECHNOLOGIES.

[73]  Concha Bielza,et al.  Discrete Bayesian Network Classifiers , 2014, ACM Comput. Surv..

[74]  Jhing-Fa Wang,et al.  Video search and indexing with reinforcement agent for interactive multimedia services , 2013, TECS.

[75]  Vipin Kumar,et al.  Trends in big data analytics , 2014, J. Parallel Distributed Comput..

[76]  Jameela Al-Jaroodi,et al.  Applications of big data to smart cities , 2015, Journal of Internet Services and Applications.

[77]  Vladimir Estivill-Castro,et al.  Why so many clustering algorithms: a position paper , 2002, SKDD.

[78]  Kristina Chodorow,et al.  MongoDB: The Definitive Guide , 2010 .

[79]  R. Shah,et al.  Data Mining Using Hierarchical Agglomerative Clustering Algorithm in Distributed Cloud Computing Environment , 2013 .

[80]  Sharul Kamal Abdul Rahim,et al.  RFID Vehicle Plate Number (E-Plate) for Tracking and Management System , 2013, ICPADS 2013.

[81]  Shanmugasundaram Hariharan,et al.  Data Integration Progression in Large Data Source Using Mapping Affinity , 2014, 2014 7th International Conference on Advanced Software Engineering and Its Applications.

[82]  Pekka Tiainen New opportunities in electrical engineering as aresult of the emergence of the Internet of Things , 2016 .

[83]  Xindong Wu,et al.  Synthesizing High-Frequency Rules from Different Data Sources , 2003, IEEE Trans. Knowl. Data Eng..

[84]  Ibrar Yaqoob,et al.  A survey of big data management: Taxonomy and state-of-the-art , 2016, J. Netw. Comput. Appl..

[85]  Raja Lavanya,et al.  Fog Computing and Its Role in the Internet of Things , 2019, Advances in Computer and Electrical Engineering.

[86]  Victor C. M. Leung,et al.  Directional Controlled Fusion in Wireless Sensor Networks , 2008, QShine '08.

[87]  Zhenglu Yang,et al.  LAPIN-SPAM: An Improved Algorithm for Mining Sequential Pattern , 2005, 21st International Conference on Data Engineering Workshops (ICDEW'05).

[88]  Ming-Yang Su,et al.  Real-time anomaly detection systems for Denial-of-Service attacks by weighted k-nearest-neighbor classifiers , 2011, Expert Syst. Appl..

[89]  Phil Simon,et al.  The Visual Organization: Data Visualization, Big Data, and the Quest for Better Decisions , 2014 .

[90]  Min Chen,et al.  Itinerary Planning for Energy-Efficient Agent Communications in Wireless Sensor Networks , 2011, IEEE Transactions on Vehicular Technology.

[91]  Tom Fawcett,et al.  Authors' Response to Gong's, "Comment on Data Science and its Relationship to Big Data and Data-Driven Decision Making" , 2014, Big Data.

[92]  Wolfgang Lehner,et al.  SAP HANA database: data management for modern business applications , 2012, SGMD.

[93]  Walid Ben Ali Big Data-Driven Smart Policing : Big Data-Based Patrol Car Dispatching , 2015 .