Time Series FeatuRe Extraction on basis of Scalable Hypothesis tests (tsfresh - A Python package)

Abstract Time series feature engineering is a time-consuming process because scientists and engineers have to consider the multifarious algorithms of signal processing and time series analysis for identifying and extracting meaningful features from time series. The Python package tsfresh (Time Series FeatuRe Extraction on basis of Scalable Hypothesis tests) accelerates this process by combining 63 time series characterization methods, which by default compute a total of 794 time series features, with feature selection on basis automatically configured hypothesis tests. By identifying statistically significant time series characteristics in an early stage of the data science process, tsfresh closes feedback loops with domain experts and fosters the development of domain specific features early on. The package implements standard APIs of time series and machine learning libraries (e.g. pandas and scikit-learn ) and is designed for both exploratory analyses as well as straightforward integration into operational data science applications.

[1]  Andreas W. Kempa-Liehr,et al.  Integrating Predictive Analytics into Complex Event Processing by Using Conditional Density Estimations , 2016, 2016 IEEE 20th International Enterprise Distributed Object Computing Workshop (EDOCW).

[2]  Andreas W. Liehr,et al.  Dissipative solitons in reaction diffusion systems , 2013 .

[3]  F. Collins,et al.  A new initiative on precision medicine. , 2015, The New England journal of medicine.

[4]  Ben D. Fulcher,et al.  Feature-based time-series analysis , 2017, ArXiv.

[5]  Nasser M. Nasrabadi,et al.  Pattern Recognition and Machine Learning , 2006, Technometrics.

[6]  Jenna Wiens,et al.  Patient Risk Stratification for Hospital-Associated C. diff as a Time-Series Classification Task , 2012, NIPS.

[7]  Erhan Guven,et al.  A Survey of Data Mining and Machine Learning Methods for Cyber Security Intrusion Detection , 2016, IEEE Communications Surveys & Tutorials.

[8]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[9]  R. Keith Mobley,et al.  An introduction to predictive maintenance , 1989 .

[10]  Wes McKinney,et al.  Data Structures for Statistical Computing in Python , 2010, SciPy.

[11]  Gaël Varoquaux,et al.  The NumPy Array: A Structure for Efficient Numerical Computation , 2011, Computing in Science & Engineering.

[12]  Marimuthu Palaniswami,et al.  Internet of Things (IoT): A vision, architectural elements, and future directions , 2012, Future Gener. Comput. Syst..

[13]  Andreas W. Kempa-Liehr Performance analysis of concurrent workflows , 2015, Journal of Big Data.

[14]  Boris Otto,et al.  Design Principles for Industrie 4.0 Scenarios , 2016, 2016 49th Hawaii International Conference on System Sciences (HICSS).

[15]  Gwilym M. Jenkins,et al.  Time series analysis, forecasting and control , 1971 .

[16]  Max A. Little,et al.  Highly comparative time-series analysis: the empirical structure of time series and their methods , 2013, Journal of The Royal Society Interface.

[17]  Frank Kienle,et al.  Time Series Analysis in Industrial Applications , 2016 .

[18]  Matthew Rocklin,et al.  Dask: Parallel Computation with Blocked algorithms and Task Scheduling , 2015, SciPy.