Applying a kernel function on time-dependent data to provide supervised-learning guarantees

We employ a Monte-Carlo approach to find the best phase space for a given data stream.We propose kFTCV, a novel approach to validate data stream classification.Results show Taken's theorem can transform data streams into independent states.Therefore, we can rely on SLT framework to ensure learning when dealing with data streams. The Statistical Learning Theory (SLT) defines five assumptions to ensure learning for supervised algorithms. Data independency is one of those assumptions, once the SLT relies on the Law of Large Numbers to ensure learning bounds. As a consequence, this assumption imposes a strong limitation to guarantee learning on time-dependent scenarios. In order to tackle this issue, some researchers relax this assumption with the detriment of invalidating all theoretical results provided by the SLT. In this paper we apply a kernel function, more precisely the Takens' immersion theorem, to reconstruct time-dependent open-ended sequences of observations, also referred to as data streams in the context of Machine Learning, into multidimensional spaces (a.k.a. phase spaces) in attempt to hold the data independency assumption. At first, we study the best immersion parameterization for our kernel function using the Distance-Weighted Nearest Neighbors (DWNN). Next, we use this best immersion to recursively forecast next observations based on the prediction horizon, estimated using the Lyapunov exponent. Afterwards, predicted observations are compared against the expected ones using the Mean Distance from the Diagonal Line (MDDL). Theoretical and experimental results based on a cross-validation strategy provide stronger evidences of generalization, what allows us to conclude that one can learn from time-dependent data after using our approach. This opens up a very important possibility for ensuring supervised learning when it comes to time-dependent data, being useful to tackle applications such as in the climate, animal tracking, biology and other domains.

[1]  Rémi Gilleron,et al.  Learning from positive and unlabeled examples , 2000, Theor. Comput. Sci..

[2]  Fraser,et al.  Independent coordinates for strange attractors from mutual information. , 1986, Physical review. A, General physics.

[3]  Charu C. Aggarwal,et al.  Stream Classification with Recurring and Novel Class Detection Using Class-Based Ensemble , 2012, 2012 IEEE 12th International Conference on Data Mining.

[4]  Yoshua Bengio,et al.  Algorithms for Hyper-Parameter Optimization , 2011, NIPS.

[5]  H. Kantz A robust method to estimate the maximal Lyapunov exponent of a time series , 1994 .

[6]  Facundo Mémoli,et al.  Classifying Clustering Schemes , 2010, Foundations of Computational Mathematics.

[7]  João Gama,et al.  A survey on concept drift adaptation , 2014, ACM Comput. Surv..

[8]  Kai Ming Ting,et al.  Fast Anomaly Detection for Streaming Data , 2011, IJCAI.

[9]  F. Takens Detecting strange attractors in turbulence , 1981 .

[10]  Rodrigo Fernandes de Mello,et al.  Improving the performance and accuracy of time series modeling based on autonomic computing systems , 2011, J. Ambient Intell. Humaniz. Comput..

[11]  G. P. King,et al.  Extracting qualitative dynamics from experimental data , 1986 .

[12]  Charu C. Aggarwal,et al.  Addressing Concept-Evolution in Concept-Drifting Data Streams , 2010, 2010 IEEE International Conference on Data Mining.

[13]  Žliobait . e,et al.  Learning under Concept Drift: an Overview , 2010 .

[14]  Bhavani M. Thuraisingham,et al.  Cloud Guided Stream Classification Using Class-Based Ensemble , 2012, 2012 IEEE Fifth International Conference on Cloud Computing.

[15]  Bhavani M. Thuraisingham,et al.  Classification and Novel Class Detection in Concept-Drifting Data Streams under Time Constraints , 2011, IEEE Transactions on Knowledge and Data Engineering.

[16]  W. Tucker The Lorenz attractor exists , 1999 .

[17]  D. M. Farid,et al.  Novel class detection in concept-drifting data stream mining employing decision tree , 2012, 2012 7th International Conference on Electrical and Computer Engineering.

[18]  Sebastian Raschka,et al.  Naive Bayes and Text Classification I - Introduction and Theory , 2014, ArXiv.

[19]  K. Binder,et al.  A Guide to Monte Carlo Simulations in Statistical Physics , 2000 .

[20]  Carl E. Rasmussen,et al.  Occam's Razor , 2000, NIPS.

[21]  Alexander J. Smola,et al.  Learning with Kernels: support vector machines, regularization, optimization, and beyond , 2001, Adaptive computation and machine learning series.

[22]  H. Abarbanel,et al.  Determining embedding dimension for phase-space reconstruction using a geometrical construction. , 1992, Physical review. A, Atomic, molecular, and optical physics.

[23]  Florin Diacu Poincaré and the Three-Body Problem. By June Barrow-Green , 1999 .

[24]  Robert P. W. Duin,et al.  Growing a multi-class classifier with a reject option , 2008, Pattern Recognit. Lett..

[25]  John Elder,et al.  Handbook of Statistical Analysis and Data Mining Applications , 2009 .

[26]  P. Protter,et al.  The Laws of Large Numbers , 2000 .

[27]  L. G. Moyano,et al.  Dynamical properties of superstable attractors in the logistic map , 2006 .

[28]  K. Ikeda Multiple-valued stationary state and its instability of the transmitted light by a ring cavity system , 1979 .

[29]  M. Hénon,et al.  A two-dimensional mapping with a strange attractor , 1976 .

[30]  J. Barrow-Green Poincare and the Three Body Problem , 1996 .

[31]  Holger Kantz,et al.  Practical implementation of nonlinear time series methods: The TISEAN package. , 1998, Chaos.

[32]  Peter Hagedorn,et al.  Invariants of chaotic attractor in a nonlinearly damped system , 1998 .

[33]  J. Yorke,et al.  Chaos: An Introduction to Dynamical Systems , 1997 .

[34]  Vladimir Vapnik,et al.  Principles of Risk Minimization for Learning Theory , 1991, NIPS.

[35]  James P. Crutchfield,et al.  Geometry from a Time Series , 1980 .

[36]  D.M. Mount,et al.  An Efficient k-Means Clustering Algorithm: Analysis and Implementation , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[37]  Eduardo Jaques Spinosa,et al.  Novelty detection with application to data streams , 2009, Intell. Data Anal..

[38]  Bernhard Schölkopf,et al.  Statistical Learning Theory: Models, Concepts, and Results , 2008, Inductive Logic.

[39]  Fionn Murtagh,et al.  Methods of Hierarchical Clustering , 2011, ArXiv.

[40]  Indre Zliobaite,et al.  Learning under Concept Drift: an Overview , 2010, ArXiv.

[41]  Elie Bienenstock,et al.  Neural Networks and the Bias/Variance Dilemma , 1992, Neural Computation.

[42]  Laurence T. Yang,et al.  Prediction of dynamical, nonlinear, and unstable process behavior , 2009, The Journal of Supercomputing.

[43]  H. Kantz,et al.  Nonlinear time series analysis , 1997 .

[44]  André Carlos Ponce de Leon Ferreira de Carvalho,et al.  Novelty detection algorithm for data streams multi-class problems , 2013, SAC '13.

[45]  O. Rössler An equation for continuous chaos , 1976 .

[46]  M. Rosenstein,et al.  A practical method for calculating largest Lyapunov exponents from small data sets , 1993 .