Fast Automatic Feature Selection for Multi-Period Sliding Window Aggregate in Time Series

As one of the most well-known artificial feature sampler, the sliding window is widely used in scenarios where spatial and temporal information exists, such as computer vision, natural language process, data stream, and time series. Among which time series is common in many scenarios like credit card payment, user behavior, and sensors. General feature selection for features extracted by sliding window aggregate calls for time-consuming iteration to generate features, and then traditional feature selection methods are employed to rank them. The decision of key parameter, i.e. the period of sliding windows, depends on the domain knowledge and calls for trivial. Currently, there is no automatic method to handle the sliding window aggregate features selection. As the time consumption of feature generation with different periods and sliding windows is huge, it is very hard to enumerate them all and then select them. In this paper, we propose a general framework using Markov Chain to solve this problem. This framework is very efficient and has high accuracy, such that it is able to perform feature selection on a variety of features and period options. We show the detail by 2 common sliding windows and 3 types of aggregation operators. And it is easy to extend more sliding windows and aggregation operators in this framework by employing existing theory about Markov Chain.

[1]  Lluís A. Belanche Muñoz,et al.  Feature selection algorithms: a survey and experimental evaluation , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[2]  R. Perfekt Extremal Behaviour of Stationary Markov Chains with Applications , 1994 .

[3]  Jing Wang,et al.  A survey on online feature selection with streaming features , 2018, Frontiers of Computer Science.

[4]  Kevin P. Murphy,et al.  Machine learning - a probabilistic perspective , 2012, Adaptive computation and machine learning series.

[5]  Richard L. Smith,et al.  Estimating the Extremal Index , 1994 .

[6]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[7]  Helen J. Wang,et al.  Online aggregation , 1997, SIGMOD '97.

[8]  S. Berman Limit Theorems for the Maximum Term in Stationary Sequences , 1964 .

[9]  M. R. Leadbetter,et al.  Extremes and Related Properties of Random Sequences and Processes: Springer Series in Statistics , 1983 .

[10]  Kalyan Veeramachaneni,et al.  Deep feature synthesis: Towards automating data science endeavors , 2015, 2015 IEEE International Conference on Data Science and Advanced Analytics (DSAA).

[11]  Tie-Yan Liu,et al.  LightGBM: A Highly Efficient Gradient Boosting Decision Tree , 2017, NIPS.

[12]  Holger Rootzén,et al.  Maxima and exceedances of stationary Markov chains , 1988, Advances in Applied Probability.

[13]  Kai-Min Chung,et al.  Chernoff-Hoeffding Bounds for Markov Chains: Generalized and Simplified , 2012, STACS.

[14]  Jennifer Widom,et al.  Resource Sharing in Continuous Sliding-Window Aggregates , 2004, VLDB.

[15]  Xindong Wu,et al.  Online Feature Selection for Streaming Features with High Redundancy Using Sliding-Window Sampling , 2018, 2018 IEEE International Conference on Big Knowledge (ICBK).

[16]  H. Rootzén,et al.  External Theory for Stochastic Processes. , 1988 .

[17]  Huan Liu,et al.  Feature Selection for Classification: A Review , 2014, Data Classification: Algorithms and Applications.

[18]  Jianqing Fan,et al.  Hoeffding's lemma for Markov Chains and its applications to statistical learning , 2018, 1802.00211.

[19]  George L. O'Brien,et al.  Extreme Values for Stationary and Markov Sequences , 1987 .

[20]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[21]  Shravas Rao A Hoeffding inequality for Markov chains , 2018, Electronic Communications in Probability.

[22]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[23]  R. Perfekt Extreme Value Theory for a Class of Markov Chains with Values in ℝd , 1997, Advances in Applied Probability.

[24]  Ruocheng Guo,et al.  Adaptive Unsupervised Feature Selection on Attributed Networks , 2019, KDD.

[25]  Holger Rootzan,et al.  Extremal Theory for Stochastic Processes , 2008 .

[26]  J. Galambos Review: M. R. Leadbetter, Georg Lindgren and Holger Rootzen, Extremes and related properties of random sequences and processes , 1985 .

[27]  I. Gikhman A Limit Theorem for the Number of Maxima in the Sequence of Random Variables in a Markov Chain , 1958 .

[28]  Oznur Alkan,et al.  One button machine for automating feature engineering in relational databases , 2017, ArXiv.

[29]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[30]  Richard L. Smith The extremal index for a Markov chain , 1992, Journal of Applied Probability.