Generic Bounds on the Maximum Deviations in Sequential/Sequence Prediction (and the Implications in Recursive Algorithms and Learning/Generalization)

Consider training data as input/output pairs (xi , yi) , i = 0, . . . , k , where xi ∈ R is input and yi ∈ R is output Let the test input/output pair be (xtest, ytest), and denote the “prediction” (extrapolation/interpolation...) of ytest by ŷtest = f (xtest), where f (·) can be any learning algorithm Since the parameters of f (·) are trained using (xi , yi) , i = 0, . . . , k , eventually ŷtest = f (xtest) = g (xtest, y0,...,k , x0,...,k) Then, for any learning algorithm f (·), Dmax (ytest − ŷtest) ≥ 2h(ytest|xtest,y0,...,k ,x0,...,k)−1 where equality holds iff ytest − ŷtest is uniform and I (ytest − ŷtest; xtest, y0,...,k , x0,...,k) = 0.