Machine learning for Internet of Things data analysis: A survey

Rapid developments in hardware, software, and communication technologies have allowed the emergence of Internet-connected sensory devices that provide observation and data measurement from the physical world. By 2020, it is estimated that the total number of Internet-connected devices being used will be between 25 and 50 billion. As the numbers grow and technologies become more mature, the volume of data published will increase. Internet-connected devices technology, referred to as Internet of Things (IoT), continues to extend the current Internet by providing connectivity and interaction between the physical and cyber worlds. In addition to increased volume, the IoT generates Big Data characterized by velocity in terms of time and location dependency, with a variety of multiple modalities and varying data quality. Intelligent processing and analysis of this Big Data is the key to developing smart IoT applications. This article assesses the different machine learning methods that deal with the challenges in IoT data by considering smart cities as the main use case. The key contribution of this study is presentation of a taxonomy of machine learning algorithms explaining how different techniques are applied to the data in order to extract higher level information. The potential and challenges of machine learning for IoT data analytics will also be discussed. A use case of applying Support Vector Machine (SVM) on Aarhus Smart City traffic data is presented for a more detailed exploration.

[1]  Johan A. K. Suykens,et al.  Regularized and sparse stochastic k-means for distributed large-scale clustering , 2015, 2015 IEEE International Conference on Big Data (Big Data).

[2]  Eui-nam Huh,et al.  Fog Computing Micro Datacenter Based Dynamic Resource Estimation and Pricing Model for IoT , 2015, 2015 IEEE 29th International Conference on Advanced Information Networking and Applications.

[3]  Madhu Shukla,et al.  Analysis and evaluation of outlier detection algorithms in data streams , 2015, 2015 International Conference on Computer, Communication and Control (IC4).

[4]  Gunnar Rätsch,et al.  Constructing Boosting Algorithms from SVMs: An Application to One-Class Classification , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[5]  Ahlame Douzal Chouakria,et al.  Multiple metric learning for large margin kNN classification of time series , 2015, 2015 23rd European Signal Processing Conference (EUSIPCO).

[6]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[7]  Ping Wang,et al.  LTCEP: Efficient Long-Term Event Processing for Internet of Things Data Streams , 2015, 2015 IEEE International Conference on Data Science and Data Intensive Systems.

[8]  Yin Zhang,et al.  Data driven quantitative trust model for the Internet of Agricultural Things , 2014, 2014 International Conference on the Internet of Things (IOT).

[9]  Shen Bin,et al.  Research on data mining models for the internet of things , 2010, 2010 International Conference on Image Analysis and Signal Processing.

[10]  Brian Kahin,et al.  Democratizing Innovation: The Evolving Phenomenon of User Innovation , 2006 .

[11]  Athanasios V. Vasilakos,et al.  When things matter: A survey on data-centric internet of things , 2016, J. Netw. Comput. Appl..

[12]  L. Baum,et al.  An inequality with applications to statistical estimation for probabilistic functions of Markov processes and to a model for ecology , 1967 .

[13]  Andrea Zanella,et al.  Internet of Things for Smart Cities , 2014, IEEE Internet of Things Journal.

[14]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Bernhard Schölkopf,et al.  A tutorial on support vector regression , 2004, Stat. Comput..

[16]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[17]  Andrew Y. Ng,et al.  Learning Feature Representations with K-Means , 2012, Neural Networks: Tricks of the Trade.

[18]  Stamatis Karnouskos,et al.  Simulation of a Smart Grid City with Software Agents , 2009, 2009 Third UKSim European Symposium on Computer Modeling and Simulation.

[19]  L. Sirovich TURBULENCE AND THE DYNAMICS OF COHERENT STRUCTURES PART I : COHERENT STRUCTURES , 2016 .

[20]  Ronald J. Williams,et al.  A Learning Algorithm for Continually Running Fully Recurrent Neural Networks , 1989, Neural Computation.

[21]  H. Hotelling Relations Between Two Sets of Variates , 1936 .

[22]  Marimuthu Palaniswami,et al.  Centered Hyperspherical and Hyperellipsoidal One-Class Support Vector Machines for Anomaly Detection in Sensor Networks , 2010, IEEE Transactions on Information Forensics and Security.

[23]  Geoffrey I. Webb,et al.  Not So Naive Bayes: Aggregating One-Dependence Estimators , 2005, Machine Learning.

[24]  Eleazar Eskin,et al.  The Spectrum Kernel: A String Kernel for SVM Protein Classification , 2001, Pacific Symposium on Biocomputing.

[25]  Awais Ahmad,et al.  Efficient Graph-Oriented Smart Transportation Using Internet of Things Generated Big Data , 2015, 2015 11th International Conference on Signal-Image Technology & Internet-Based Systems (SITIS).

[26]  Laurence T. Yang,et al.  Data Mining for Internet of Things: A Survey , 2014, IEEE Communications Surveys & Tutorials.

[27]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[28]  Nathalie Mitton,et al.  Towards a smart city based on cloud of things, a survey on the smart city vision and paradigms , 2014, WiMobCity '14.

[29]  Alberto M. C. Souza,et al.  An Outlier Detect Algorithm using Big Data Processing and Internet of Things Architecture , 2015, ANT/SEIT.

[30]  H. Hotelling Analysis of a complex of statistical variables into principal components. , 1933 .

[31]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[32]  Yanbo Han,et al.  A Hybrid Processing System for Large-Scale Traffic Sensor Data , 2015, IEEE Access.

[33]  Apostolos Papageorgiou,et al.  Efficient auto-configuration of energy-related parameters in cloud-based IoT platforms , 2014, 2014 IEEE 3rd International Conference on Cloud Networking (CloudNet).

[34]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[35]  Russell C. Eberhart,et al.  Neural network PC tools: a practical guide , 1990 .

[36]  Diego Klabjan,et al.  Warehousing and Analyzing Massive RFID Data Sets , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[37]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[38]  Yang Ji,et al.  A hybrid method for short-term sensor data forecasting in Internet of Things , 2014, 2014 11th International Conference on Fuzzy Systems and Knowledge Discovery (FSKD).

[39]  Hyun-Chul Kim,et al.  Constructing support vector machine ensemble , 2003, Pattern Recognit..

[40]  David Burghes,et al.  Teaching and Learning of Mathematics and its Applications: First Results from a Comparative Empirical Study in England and Germany , 1992 .

[41]  P ? ? ? ? ? ? ? % ? ? ? ? , 1991 .

[42]  Xin Tao,et al.  Clustering massive small data for IOT , 2014, The 2014 2nd International Conference on Systems and Informatics (ICSAI 2014).

[43]  Amit P. Sheth,et al.  Semantic Sensor Web , 2008, IEEE Internet Computing.

[44]  Amit P. Sheth,et al.  Computing for human experience: Semantics-empowered sensors, services, and social computing on the ubiquitous Web , 2010, IEEE Internet Computing.

[45]  Rajiv Ranjan,et al.  Processing Distributed Internet of Things Data in Clouds , 2015, IEEE Cloud Computing.

[46]  Djamel Djenouri,et al.  A Study of Wireless Sensor Networks for Urban Traffic Monitoring: Applications and Architectures , 2013, ANT/SEIT.

[47]  Mete Celik,et al.  Anomaly detection in temperature data using DBSCAN algorithm , 2011, 2011 International Symposium on Innovations in Intelligent Systems and Applications.

[48]  S. Tom Au,et al.  Mining Rare Events Data by Sampling and Boosting: A Case Study , 2010, ICISTM.

[49]  Thorsten Joachims,et al.  Learning structural SVMs with latent variables , 2009, ICML '09.

[50]  R. Kalman,et al.  New results in linear prediction and filtering theory Trans. AMSE , 1961 .

[51]  P. Toint,et al.  A Quadratic Programming Bibliography , 2012 .

[52]  V VasilakosAthanasios,et al.  When things matter , 2016 .

[53]  Shusen Yang,et al.  A survey on the ietf protocol suite for the internet of things: standards, challenges, and opportunities , 2013, IEEE Wireless Communications.

[54]  Hao Wang,et al.  Big data and industrial Internet of Things for the maritime industry in Northwestern Norway , 2015, TENCON 2015 - 2015 IEEE Region 10 Conference.

[55]  Marimuthu Palaniswami,et al.  DP1SVM: A dynamic planar one-class support vector machine for Internet of Things environment , 2015, 2015 International Conference on Recent Advances in Internet of Things (RIoT).

[56]  J. Brian Gray,et al.  Introduction to Linear Regression Analysis , 2002, Technometrics.

[57]  Hui Wang,et al.  The fog computing service for healthcare , 2015, 2015 2nd International Symposium on Future Information and Communication Technologies for Ubiquitous HealthCare (Ubi-HealthTech).

[59]  David Barber,et al.  Bayesian reasoning and machine learning , 2012 .

[60]  Raj Jain,et al.  An Internet of Things Framework for Smart Energy in Buildings: Designs, Prototype, and Experiments , 2015, IEEE Internet of Things Journal.

[61]  Victoria J. Hodge,et al.  A Survey of Outlier Detection Methodologies , 2004, Artificial Intelligence Review.

[62]  I. Jolliffe Principal Component Analysis , 2002 .

[63]  Hans-Peter Kriegel,et al.  Density‐based clustering , 2011, WIREs Data Mining Knowl. Discov..

[64]  Kunihiko Fukushima,et al.  Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position , 1980, Biological Cybernetics.

[65]  Rich Caruana,et al.  An empirical comparison of supervised learning algorithms , 2006, ICML.

[66]  Daniel W. Apley,et al.  Image denoising with a multi-phase kernel principal component approach and an ensemble version , 2011, 2011 IEEE Applied Imagery Pattern Recognition Workshop (AIPR).

[67]  Michel Riveill,et al.  An Architecture to Support the Collection of Big Data in the Internet of Things , 2014, 2014 IEEE World Congress on Services.

[68]  Huiqun Zhao,et al.  A Data Processing Algorithm in EPC Internet of Things , 2014, 2014 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery.

[69]  Isabelle Guyon,et al.  Automatic Capacity Tuning of Very Large VC-Dimension Classifiers , 1992, NIPS.

[70]  S S Tulasiram,et al.  A real-time architecture for smart energy management , 2010, 2010 Innovative Smart Grid Technologies (ISGT).

[71]  Amit P. Sheth,et al.  Internet of Things: The Story So Far , 2014, IoT.

[72]  María Bermúdez-Edo,et al.  Challenges for Quality of Data in Smart Cities , 2015, ACM J. Data Inf. Qual..

[73]  Athanasios V. Vasilakos,et al.  Data Mining for the Internet of Things: Literature Review and Challenges , 2015, Int. J. Distributed Sens. Networks.

[74]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[75]  Kevin P. Murphy,et al.  Machine learning - a probabilistic perspective , 2012, Adaptive computation and machine learning series.

[76]  Gene H. Golub,et al.  Matrix computations , 1983 .

[77]  D. Ruppert The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .

[78]  Michael I. Jordan,et al.  Kernel independent component analysis , 2003 .

[79]  Andrew W. Senior,et al.  Long short-term memory recurrent neural network architectures for large scale acoustic modeling , 2014, INTERSPEECH.

[80]  Tong Zhang,et al.  An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods , 2001, AI Mag..

[81]  Beng Chin Ooi,et al.  iDistance: An adaptive B+-tree based indexing method for nearest neighbor search , 2005, TODS.

[82]  Amit P. Sheth,et al.  Internet of Things to Smart IoT Through Semantic, Cognitive, and Perceptual Computing , 2016, IEEE Intelligent Systems.

[83]  ShaoHua Hu Research on data fusion of the Internet of Things , 2015, 2015 International Conference on Logistics, Informatics and Service Sciences (LISS).

[84]  Alexander J. Smola,et al.  Learning with Kernels: support vector machines, regularization, optimization, and beyond , 2001, Adaptive computation and machine learning series.

[85]  Edward Curry,et al.  An Autonomic Approach to Real-Time Predictive Analytics Using Open Data and Internet of Things , 2014, 2014 IEEE 11th Intl Conf on Ubiquitous Intelligence and Computing and 2014 IEEE 11th Intl Conf on Autonomic and Trusted Computing and 2014 IEEE 14th Intl Conf on Scalable Computing and Communications and Its Associated Workshops.

[86]  Dorothy Ndedi Monekosso,et al.  Data reconciliation in a smart home sensor network , 2013, Expert Syst. Appl..

[87]  Tsuyoshi Murata,et al.  {m , 1934, ACML.

[88]  William P. Birmingham,et al.  Modeling Form for On-line Following of Musical Performances , 2005, AAAI.

[89]  Mark Weiser The computer for the 21st century , 1991 .

[90]  Andrew W. Fitzgibbon,et al.  Real-time human pose recognition in parts from single depth images , 2011, CVPR 2011.

[91]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[92]  Vikramaditya Jakkula,et al.  Outlier Detection in Smart Environment Structured Power Datasets , 2010, 2010 Sixth International Conference on Intelligent Environments.

[93]  R. E. Hall,et al.  VISION OF A SMART CITY , 2000 .

[94]  Dominique Genoud,et al.  Big Data in Smart Cities: From Poisson to Human Dynamics , 2014, 2014 28th International Conference on Advanced Information Networking and Applications Workshops.

[95]  Ramón Alcarria,et al.  An Internet of Things-Based Model for Smart Water Management , 2014, 2014 28th International Conference on Advanced Information Networking and Applications Workshops.

[96]  Antonio Iera,et al.  The Internet of Things: A survey , 2010, Comput. Networks.

[97]  L. Sirovich Turbulence and the dynamics of coherent structures. I. Coherent structures , 1987 .

[98]  George A. F. Seber,et al.  Linear regression analysis , 1977 .

[99]  Nazim Agoulmine,et al.  Enhancing eHealth smart applications: A Fog-enabled approach , 2015, 2015 17th International Conference on E-health Networking, Application & Services (HealthCom).

[100]  Vangelis Metsis,et al.  Spam Filtering with Naive Bayes - Which Naive Bayes? , 2006, CEAS.

[101]  Xiaolei Ma,et al.  Mining smart card data for transit riders’ travel patterns , 2013 .

[102]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[103]  I. Tomek An Experiment with the Edited Nearest-Neighbor Rule , 1976 .

[104]  Maribel Yasmina Santos,et al.  Improving Cities Sustainability through the Use of Data Mining in a Context of Big City Data , 2015 .

[105]  V. Barnett,et al.  Applied Linear Statistical Models , 1975 .

[106]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[107]  A. Prasad,et al.  Newer Classification and Regression Tree Techniques: Bagging and Random Forests for Ecological Prediction , 2006, Ecosystems.

[108]  Jürgen Schmidhuber,et al.  Deep learning in neural networks: An overview , 2014, Neural Networks.

[109]  Muhammad Aamir Khan,et al.  A novel learning method to classify data streams in the internet of things , 2014, 2014 National Software Engineering Conference.

[110]  D. Toshniwal,et al.  Clustering techniques for streaming data-a survey , 2013, 2013 3rd IEEE International Advance Computing Conference (IACC).

[111]  Jennifer A. Scott,et al.  Numerical Analysis Group Internal Report 2004-1 Council for the Central Laboratory of the Research Councils November 18 , .

[112]  James H. Aylor,et al.  Computer for the 21st Century , 1999, Computer.

[113]  Gert Cauwenberghs,et al.  Incremental and Decremental Support Vector Machine Learning , 2000, NIPS.

[114]  Stefan Decker,et al.  Real time analysis of sensor data for the Internet of Things by means of clustering and event processing , 2015, 2015 IEEE International Conference on Communications (ICC).

[115]  Jacopo Torriti,et al.  Demand Side Management for the European Supergrid: Occupancy variances of European single-person households , 2012 .

[116]  Amit P. Sheth Transforming Big Data into Smart Data for Smart Energy: Deriving Value via Harnessing Volume, Variety and Velocity , 2013 .

[117]  Ricardo J. G. B. Campello,et al.  Density-Based Clustering Based on Hierarchical Density Estimates , 2013, PAKDD.

[118]  H. Abdi,et al.  Principal component analysis , 2010 .

[119]  Andrew McCallum,et al.  Maximum Entropy Markov Models for Information Extraction and Segmentation , 2000, ICML.

[120]  Chih-Jen Lin,et al.  A comparison of methods for multiclass support vector machines , 2002, IEEE Trans. Neural Networks.

[121]  Kurt Hornik,et al.  Approximation capabilities of multilayer feedforward networks , 1991, Neural Networks.

[122]  Amit P. Sheth Transforming Big Data into Smart Data: Deriving Value via harnessing Volume, Variety and Velocity using semantics and Semantic Web , 2013, SEBD.

[123]  Igor Kotenko,et al.  Neural network approach to forecast the state of the Internet of Things elements , 2015, 2015 XVIII International Conference on Soft Computing and Measurements (SCM).

[124]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[125]  Michael H. Kutner Applied Linear Statistical Models , 1974 .

[126]  Salvatore J. Stolfo,et al.  One Class Support Vector Machines for Detecting Anomalous Windows Registry Accesses , 2003 .

[127]  VARUN CHANDOLA,et al.  Anomaly detection: A survey , 2009, CSUR.

[128]  Amit Sheth Transforming Big Data into Smart Data: Deriving value via harnessing Volume, Variety, and Velocity using semantic techniques and technologies , 2014, ICDE.

[129]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[130]  Daniel W. Apley,et al.  Preimages for variation patterns from kernel PCA and bagging , 2014 .

[131]  Heng Tao Shen,et al.  Principal Component Analysis , 2009, Encyclopedia of Biometrics.

[132]  Nathalie Mitton,et al.  Towards a smart city based on cloud of things, a survey on the smart city vision and paradigms , 2017, Trans. Emerg. Telecommun. Technol..

[133]  Dorothy E. Denning,et al.  An Intrusion-Detection Model , 1986, 1986 IEEE Symposium on Security and Privacy.

[134]  Xiuming Chen,et al.  Smart grid time series big data processing system , 2015, 2015 IEEE Advanced Information Technology, Electronic and Automation Control Conference (IAEAC).

[135]  Zheng Chen,et al.  P-packSVM: Parallel Primal grAdient desCent Kernel SVM , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[136]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[137]  Anthony Widjaja,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2003, IEEE Transactions on Neural Networks.

[138]  J. Manyika Big data: The next frontier for innovation, competition, and productivity , 2011 .

[139]  G. Lewicki,et al.  Approximation by Superpositions of a Sigmoidal Function , 2003 .

[140]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[141]  Thomas G. Dietterich Machine Learning for Sequential Data: A Review , 2002, SSPR/SPR.

[142]  Patrick van der Smagt,et al.  Robust Detection of Anomalies via Sparse Methods , 2015, ICONIP.

[143]  Alexander J. Smola,et al.  Support Vector Regression Machines , 1996, NIPS.

[144]  Andrew McCallum,et al.  A comparison of event models for naive bayes text classification , 1998, AAAI 1998.

[145]  Sateesh Addepalli,et al.  Fog computing and its role in the internet of things , 2012, MCC '12.

[146]  Jason Weston,et al.  Support vector machines for multi-class pattern recognition , 1999, ESANN.

[147]  AbdiHervé,et al.  Principal Component Analysis , 2010, Essentials of Pattern Recognition.

[148]  Nikos A. Vlassis,et al.  The global k-means clustering algorithm , 2003, Pattern Recognit..

[149]  Ralf Tönjes,et al.  CityPulse: Large Scale Data Analytics Framework for Smart Cities , 2016, IEEE Access.

[150]  Tony R. Martinez,et al.  Improving classification accuracy by identifying and removing instances that should be misclassified , 2011, The 2011 International Joint Conference on Neural Networks.

[151]  Terrence J. Sejnowski,et al.  Parallel Networks that Learn to Pronounce English Text , 1987, Complex Syst..

[152]  Bernhard Schölkopf,et al.  Estimating the Support of a High-Dimensional Distribution , 2001, Neural Computation.

[153]  SchmidhuberJürgen Deep learning in neural networks , 2015 .

[154]  Raja Lavanya,et al.  Fog Computing and Its Role in the Internet of Things , 2019, Advances in Computer and Electrical Engineering.

[155]  Nirvana Meratnia,et al.  Adaptive and Online One-Class Support Vector Machine-Based Outlier Detection Techniques for Wireless Sensor Networks , 2009, 2009 International Conference on Advanced Information Networking and Applications Workshops.

[156]  Ming Zhang,et al.  An Anomaly Detection Model Based on One-Class SVM to Detect Network Intrusions , 2015, 2015 11th International Conference on Mobile Ad-hoc and Sensor Networks (MSN).

[157]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[158]  Harry Zhang,et al.  The Optimality of Naive Bayes , 2004, FLAIRS.

[159]  Giles M. Foody,et al.  A relative evaluation of multiclass image classification by support vector machines , 2004, IEEE Transactions on Geoscience and Remote Sensing.

[160]  J. Platt Sequential Minimal Optimization : A Fast Algorithm for Training Support Vector Machines , 1998 .

[161]  Karl Pearson F.R.S. LIII. On lines and planes of closest fit to systems of points in space , 1901 .

[162]  R. E. Kalman,et al.  New Results in Linear Filtering and Prediction Theory , 1961 .