Finite sample properties of system identification of ARX models under mixing conditions

The asymptotic convergence properties of system identification methods are well known, but comparatively little is known about the practical situation where only a finite number of data points are available. In this paper we consider the finite sample properties of prediction error methods for system identification. We consider ARX models and uniformly bounded criterion functions. The problem we pose is: how many data points are required in order to guarantee with high probability that the expected value of the identification criterion is close to its empirical mean value. The sample sizes are obtained using generalisations of risk minimisation theory to weakly dependent processes. We obtain uniform probabilistic bounds on the difference between the expected value of the identification criterion and the empirical value evaluated on the observed data points. The bounds are very general, in particular no assumption is made about the true system belonging to the model class. Further analysis shows that in order to maintain a given bound on the difference, the number of data points required grows at most at a polynomial rate in the model order and in many cases no faster than quadratically. The results obtained here generalise previous results derived for the case where the observed data was independent and identically distributed.

[1]  David Haussler,et al.  Decision Theoretic Generalizations of the PAC Model for Neural Net and Other Learning Applications , 1992, Inf. Comput..

[2]  Dharmendra S. Modha,et al.  Minimum complexity regression estimation with weakly dependent observations , 1996, IEEE Trans. Inf. Theory.

[3]  Petre Stoica,et al.  Decentralized Control , 2018, The Control Systems Handbook.

[4]  László Györfi,et al.  A Probabilistic Theory of Pattern Recognition , 1996, Stochastic Modelling and Applied Probability.

[5]  Arne Hole Vapnik-Chervonenkis Generalization Bounds for Real Valued Neural Networks , 1996, Neural Computation.

[6]  Mathukumalli Vidyasagar,et al.  A Theory of Learning and Generalization , 1997 .

[7]  P. Kumar,et al.  Learning dynamical systems in a stationary environment , 1998 .

[8]  Dharmendra S. Modha,et al.  Memory-Universal Prediction of Stationary Random Processes , 1998, IEEE Trans. Inf. Theory.

[9]  A. Mokkadem Mixing properties of ARMA processes , 1988 .

[10]  Bin Yu RATES OF CONVERGENCE FOR EMPIRICAL PROCESSES OF STATIONARY MIXING SEQUENCES , 1994 .

[11]  Ron Meir Performance bounds for nonlinear time series prediction , 1997, COLT '97.

[12]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[13]  Vladimir Vapnik,et al.  Estimation of Dependences Based on Empirical Data: Springer Series in Statistics (Springer Series in Statistics) , 1982 .

[14]  Lennart Ljung,et al.  System Identification: Theory for the User , 1987 .

[15]  David Haussler,et al.  Sphere Packing Numbers for Subsets of the Boolean n-Cube with Bounded Vapnik-Chervonenkis Dimension , 1995, J. Comb. Theory, Ser. A.

[16]  D. Pollard Convergence of stochastic processes , 1984 .

[17]  Iven M. Y. Mareels,et al.  Finite sample properties of linear model identification , 1999, IEEE Trans. Autom. Control..

[18]  R. C. Williamson,et al.  Sample Complexity of Least Squares Identification of FIR Models , 1996 .

[19]  中澤 真,et al.  Devroye, L., Gyorfi, L. and Lugosi, G. : A Probabilistic Theory of Pattern Recognition, Springer (1996). , 1997 .