Robust histogram-based feature engineering of time series data

Collecting data at regular time nowadays is ubiquitous. The most widely used type of data that is being collected and analyzed is financial data and sensor readings. Various businesses have realized that financial time series analysis is a powerful analytical tool that can lead to competitive advantages. Likewise, sensor networks generate time series and if they are properly analyzed can give a better understanding of the processes that are being monitored. In this paper we propose a novel generic histogram-based method for feature engineering of time series data. The preprocessing phase consists of several steps: deseansonalyzing the time series data, modeling the speed of change with first derivatives, and finally calculating histograms. By doing all of those steps the goal is three-fold: achieve invariance to different factors, good modeling of the data and preform significant feature reduction. This method was applied to the AAIA Data Mining Competition 2015, which was concerned with recognition of activities carried out by firefighters by analyzing body sensor network readings. By doing that we were able to score the third place with predictive accuracy of about 83%, which was about 1% worse than the winning solution.

[1]  Heng Tao Shen,et al.  Principal Component Analysis , 2009, Encyclopedia of Biometrics.

[2]  Eamonn J. Keogh,et al.  Classification of streaming time series under more realistic assumptions , 2015, Data Mining and Knowledge Discovery.

[3]  Adrian G. Barnett,et al.  Analysing Seasonal Health Data , 2010 .

[4]  Andrzej Skowron,et al.  From Sensory Data to Decision Making: A Perspective on Supporting a Fire Commander , 2013, 2013 IEEE/WIC/ACM International Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technologies (IAT).

[5]  Abbas Jamalipour,et al.  Wireless Body Area Networks: A Survey , 2014, IEEE Communications Surveys & Tutorials.

[6]  Adam Krasuski A framework for Dynamic Analytical Risk Management at the emergency scene. From tribal to top down in the risk management maturity model , 2014, 2014 Federated Conference on Computer Science and Information Systems.

[7]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[8]  Chih-Jen Lin,et al.  A Practical Guide to Support Vector Classication , 2008 .

[9]  Christos Faloutsos,et al.  Efficient Similarity Search In Sequence Databases , 1993, FODO.

[10]  Dominik Slezak,et al.  Tagging Firefighter Activities at the emergency scene: Summary of AAIA'15 data mining competition at knowledge pit , 2015, 2015 Federated Conference on Computer Science and Information Systems (FedCSIS).

[11]  Carlos Agón,et al.  Time-series data mining , 2012, CSUR.

[12]  Michal Meina,et al.  Towards Robust Framework for On-line Human Activity Reporting Using Accelerometer Readings , 2014, AMT.

[13]  Tak-Chung Fu,et al.  A review on time series data mining , 2011, Eng. Appl. Artif. Intell..

[14]  Robert P. W. Duin,et al.  Feature Scaling in Support Vector Data Descriptions , 2000 .