Automation of feature engineering for IoT analytics

This paper presents an approach for automation of interpretable feature selection for Internet Of Things Analytics (IoTA) using machine learning (ML) techniques. Authors have conducted a survey over different people involved in different IoTA based application development tasks. The survey reveals that feature selection is the most time consuming and niche skill demanding part of the entire workflow. This paper shows how feature selection is successfully automated without sacrificing the decision making accuracy and thereby reducing the project completion time and cost of hiring expensive resources. Several pattern recognition principles and state of art (SoA) ML techniques are followed to design the overall approach for the proposed automation. Three data sets are considered to establish the proof-of-concept. Experimental results show that the proposed automation is able to reduce the time for feature selection to 2 days instead of 4 -- 6 months which would have been required in absence of the automation. This reduction in time is achieved without any sacrifice in the accuracy of the decision making process. Proposed method is also compared against Multi Layer Perceptron (MLP) model as most of the state of the art works on IoTA uses MLP based Deep Learning. Moreover the feature selection method is compared against SoA feature reduction technique namely Principal Component Analysis (PCA) and its variants. The results obtained show that the proposed method is effective.

[1]  Tanushyam Chattopadhyay,et al.  Action recognition using joint coordinates of 3D skeleton data , 2015, 2015 IEEE International Conference on Image Processing (ICIP).

[2]  Anand Rajaraman,et al.  Mining of Massive Datasets , 2011 .

[3]  Debnath Mukherjee,et al.  'What affects me?': a smart public alert system based on stream reasoning , 2013, ICUIMC '13.

[4]  Jay Lee,et al.  Robust performance degradation assessment methods for enhanced rolling element bearing prognostics , 2003, Adv. Eng. Informatics.

[5]  Yuming Zhou,et al.  A Feature Subset Selection Algorithm Automatic Recommendation Method , 2013, J. Artif. Intell. Res..

[6]  Andrew T. Campbell,et al.  Bewell: A smartphone application to monitor, model and promote wellbeing , 2011, PervasiveHealth 2011.

[7]  Tanzeem Choudhury,et al.  Passive and In-Situ assessment of mental and physical well-being using mobile sensors , 2011, UbiComp '11.

[8]  Wei Pan,et al.  SoundSense: scalable sound sensing for people-centric applications on mobile phones , 2009, MobiSys '09.

[9]  Nadine Mandran,et al.  Identifying emotions expressed by mobile users through 2D surface and 3D motion gestures , 2012, UbiComp '12.

[10]  Shaojiang Dong,et al.  Bearing Degradation Process Prediction Based on the Support Vector Machine and Markov Model , 2014 .

[11]  Tanzeem Choudhury,et al.  Towards Population Scale Activity Recognition: A Framework for Handling Data Diversity , 2012, AAAI.

[12]  Alex Pentland,et al.  The social fMRI: measuring, understanding, and designing social mechanisms in the real world , 2011, UbiComp '11.

[13]  Snehasis Banerjee,et al.  Towards Wide Learning: Experiments in Healthcare , 2016, ArXiv.

[14]  Partha Garai,et al.  Fuzzy-Rough MRMS Method for Relevant and Significant Attribute Selection , 2012, IPMU.

[15]  Pradipta Maji,et al.  Rough set based maximum relevance-maximum significance criterion and Gene selection from microarray data , 2011, Int. J. Approx. Reason..

[16]  R. Eston,et al.  Validity of heart rate, pedometry, and accelerometry for predicting the energy cost of children's activities. , 1998, Journal of applied physiology.

[17]  Mirco Musolesi,et al.  Sensing meets mobile social networks: the design, implementation and evaluation of the CenceMe application , 2008, SenSys '08.

[18]  V. Ramu Reddy,et al.  Automatic Selection of Binarization Method for Robust OCR , 2013, 2013 12th International Conference on Document Analysis and Recognition.

[19]  Lothar Richter Jure Leskovec, Anand Rajaraman, and Jeffrey D. Ullman. Mining of Massive Datasets. Cambridge, Cambridge University Press. , 2018 .

[20]  Debnath Mukherjee,et al.  Windowing mechanisms for web scale stream reasoning , 2013, Web-KR '13.

[21]  Zahir Tari,et al.  On the Move to Meaningful Internet Systems. OTM 2018 Conferences , 2018, Lecture Notes in Computer Science.

[22]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  Aniruddha Sinha,et al.  Recognition of channel logos from streamed videos for value added services in connected TV , 2011, 2011 IEEE International Conference on Consumer Electronics (ICCE).

[24]  Arpan Pal,et al.  Demo: A Smart Framework for IoT Analytic Workflow Development , 2015, SenSys.

[25]  Zhigang Liu,et al.  The Jigsaw continuous sensing engine for mobile phone applications , 2010, SenSys '10.

[26]  Snehasis Banerjee,et al.  Semantic Exploration of Sensor Data , 2014, Web-KR '14.

[27]  Andreas Krause,et al.  Context-aware mobile computing: learning context- dependent personal preferences from a wearable sensor array , 2006, IEEE Transactions on Mobile Computing.

[28]  M. Salman Leong,et al.  Wavelet Analysis: Mother Wavelet Selection Methods , 2013 .

[29]  Gilles Notton,et al.  Meteorological time series forecasting based on MLP modelling using heterogeneous transfer functions , 2014, ArXiv.

[30]  D. Mukherjee,et al.  Ad-hoc ride sharing application using continuous SPARQL queries , 2012, WWW.

[31]  Debnath Mukherjee,et al.  Towards a Universal Notification System , 2013, 2013 IEEE/WIC/ACM International Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technologies (IAT).