Cyclic Seesaw Process for Optimization and Identification

A known approach to optimization is the cyclic (or alternating or block coordinate) method, where the full parameter vector is divided into two or more subvectors and the process proceeds by sequentially optimizing each of the subvectors, while holding the remaining parameters at their most recent values. One advantage of such a scheme is the preservation of potentially large investments in software, while allowing for an extension of capability to include new parameters for estimation. A specific case of interest involves cross-sectional data that is modeled in state–space form, where there is interest in estimating the mean vector and covariance matrix of the initial state vector as well as certain parameters associated with the dynamics of the underlying differential equations (e.g., power spectral density parameters). This paper shows that, under reasonable conditions, the cyclic scheme leads to parameter estimates that converge to the optimal joint value for the full vector of unknown parameters. Convergence conditions here differ from others in the literature. Further, relative to standard search methods on the full vector, numerical results here suggest a more general property of faster convergence for seesaw as a consequence of the more “aggressive” (larger) gain coefficient (step size) possible.

[1]  Ronald E. Miller Optimization: Foundations and Applications , 1999 .

[2]  P. Tseng Convergence of a Block Coordinate Descent Method for Nondifferentiable Minimization , 2001 .

[3]  Fang-Kuo Sun,et al.  A maximum likelihood algorithm for the mean and covariance of nonidentically distributed observations , 1982 .

[4]  Calyampudi R. Rao,et al.  Linear Statistical Inference and Its Applications. , 1975 .

[5]  Elijah Polak,et al.  Optimization: Algorithms and Consistent Approximations , 1997 .

[6]  James C. Spall,et al.  Cyclic seesaw optimization with applications to state-space model identification , 2011, 2011 45th Annual Conference on Information Sciences and Systems.

[7]  Alfred O. Hero,et al.  Space-alternating generalized expectation-maximization algorithm , 1994, IEEE Trans. Signal Process..

[8]  James C. Spall,et al.  Parameter identification for state-space models with nuisance parameters , 1990 .

[9]  W. Fleming Functions of Several Variables , 1965 .

[10]  W. Achtziger On simultaneous optimization of truss geometry and topology , 2007 .

[11]  E. Weinstein,et al.  A new method for evaluating the log-likelihood gradient, the Hessian, and the Fisher information matrix for linear dynamic systems , 1989, IEEE Trans. Inf. Theory.

[12]  Dimitri P. Bertsekas,et al.  Nonlinear Programming , 1997 .

[13]  Hiroshi Konno,et al.  A cutting plane algorithm for solving bilinear programs , 1976, Math. Program..

[14]  T. Apostol Mathematical Analysis , 1957 .

[15]  R. Shumway,et al.  Estimation and tests of hypotheses for the initial mean and covariance in the kalman filter model , 1981 .

[16]  Pierre Hansen,et al.  Pooling Problem: Alternate Formulations and Solution Methods , 2000, Manag. Sci..

[17]  C. R. Rao,et al.  Linear Statistical Inference and its Applications , 1968 .

[18]  James C. Spall,et al.  Introduction to Stochastic Search and Optimization. Estimation, Simulation, and Control (Spall, J.C. , 2007 .

[19]  Yuichi Mori,et al.  Handbook of Computational Statistics , 2004 .

[20]  Brett Ninness,et al.  On Gradient-Based Search for Multivariable System Estimates , 2008, IEEE Transactions on Automatic Control.

[21]  Wanli Min,et al.  Journal of the American Statistical Association a Statistical Approach to Thermal Management of Data Centers under Steady State and System Perturbations a Statistical Approach to Thermal Management of Data Centers under Steady State and System Perturbations , 2022 .

[22]  H. H. Rosenbrock,et al.  An Automatic Method for Finding the Greatest or Least Value of a Function , 1960, Comput. J..

[23]  Mokhtar S. Bazaraa,et al.  Nonlinear Programming: Theory and Algorithms , 1993 .

[24]  Seok Lee,et al.  Cyclic optimization algorithms for simultaneous structure and motion recovery in computer vision , 2008 .

[25]  Brigitte Jaumard,et al.  Concavity cuts for disjoint bilinear programming , 2001, Math. Program..

[26]  Geoffrey J. McLachlan,et al.  The EM Algorithm , 2012 .

[27]  T. Moon,et al.  Mathematical Methods and Algorithms for Signal Processing , 1999 .

[28]  James C. Bezdek,et al.  Convergence of Alternating Optimization , 2003, Neural Parallel Sci. Comput..

[29]  James C. Spall,et al.  Introduction to stochastic search and optimization - estimation, simulation, and control , 2003, Wiley-Interscience series in discrete mathematics and optimization.

[30]  P. Caines,et al.  Linear system identification from non-stationary cross-sectional data , 1978, 1978 IEEE Conference on Decision and Control including the 17th Symposium on Adaptive Processes.

[31]  Jeffrey A. Fessler,et al.  Grouped-coordinate ascent algorithms for penalized-likelihood transmission image reconstruction , 1997, IEEE Transactions on Medical Imaging.