Large-Scale Regression: A Partition Analysis of the Least Squares Multisplitting

Indirect measurements of physical parameters of interest require a mathematical model in which these parameters are estimated from the gathered measurements. Within the least squares (LS) estimation, the parameters are estimated through a regression problem. The presence of dynamics, multiple sensors, and high sampling rates leads to high-dimensional regression matrices. This paper deals with solving such large-scale regression problems time efficiently. We revisit Renaut’s least squares multisplitting (LSMS) technique aimed at solving the ordinary LS problem in parallel. The LSMS decomposes the design matrix column-wise into several blocks. The global LS solution is subsequently replaced by an equivalent set of local LS problems that are to be solved in parallel. We study how the user should configure the partition of the multisplitting. We propose a partition design based on a clustering analysis and prove the consistency of this approach. The method is illustrated with dedicated numerical simulations for a highly scalable LS-based problem within engineering: frequency response function (FRF) estimation in the presence of missing output samples. Finally, its practical utility is shown with a laboratory measurement application.

[1]  E. L. Lehmann,et al.  Theory of point estimation , 1950 .

[2]  Yousef Saad,et al.  Iterative methods for sparse linear systems , 2003 .

[3]  Bing Han,et al.  Method of Tikhonov regularization for weighted frequency-difference electrical impedance tomography , 2017, 2017 IEEE International Instrumentation and Measurement Technology Conference (I2MTC).

[4]  Andrea Bergmann,et al.  Statistical Parametric Mapping The Analysis Of Functional Brain Images , 2016 .

[5]  Kurt Barbé,et al.  Towards solving massive regression problems: Least squares multisplitting , 2017, 2017 IEEE International Instrumentation and Measurement Technology Conference (I2MTC).

[6]  D. O’Leary,et al.  Multi-Splittings of Matrices and Parallel Solution of Linear Systems , 1985 .

[7]  Guangbao Guo,et al.  Parallel Statistical Computing for Statistical Inference , 2012 .

[8]  Robin Sibson,et al.  SLINK: An Optimally Efficient Algorithm for the Single-Link Cluster Method , 1973, Comput. J..

[9]  Marco Prioli,et al.  Simplified Modeling and Identification of Nonlinear Systems Under Quasi-Sinusoidal Conditions , 2016, IEEE Transactions on Instrumentation and Measurement.

[10]  Deyu Sun,et al.  Preconditioned parallel multisplitting USAOR method for H-matrices linear systems , 2016, Appl. Math. Comput..

[11]  R. Leelaruji,et al.  PMU-based voltage instability detection through linear regression , 2013, 2013 IEEE PES Asia-Pacific Power and Energy Engineering Conference (APPEEC).

[12]  Gerd Vandersteen,et al.  Frequency Response Function Estimation in the Presence of Missing Output Data , 2015, IEEE Transactions on Instrumentation and Measurement.

[13]  Rosemary A. Renaut,et al.  Parallel Multisplittings for Optimization , 1995, Parallel Algorithms Appl..

[14]  Johan Schoukens,et al.  Best Linear Approximation of Wiener Systems Using Multilevel Signals: Theory and Experiments , 2018, IEEE Transactions on Instrumentation and Measurement.

[15]  James Demmel,et al.  Communication-optimal Parallel and Sequential QR and LU Factorizations , 2008, SIAM J. Sci. Comput..

[16]  J. Doob Probability as Measure , 1941 .

[17]  Soummya Kar,et al.  Distributed Consensus Algorithms in Sensor Networks With Imperfect Communication: Link Failures and Channel Noise , 2007, IEEE Transactions on Signal Processing.

[18]  Rosemary A. Renaut,et al.  A parallel multisplitting solution of the least squares problem , 1998, Numer. Linear Algebra Appl..

[19]  Angelo Cangelosi,et al.  Positioning control on a collaborative robot by sensor fusion with liquid state machines , 2017, 2017 IEEE International Instrumentation and Measurement Technology Conference (I2MTC).

[20]  J. Navarro-Pedreño Numerical Methods for Least Squares Problems , 1996 .

[22]  Tao Song,et al.  Hybrid Time-Variant Frequency Response Function Estimates Using Multiple Sets of Basis Functions , 2017, IEEE Transactions on Instrumentation and Measurement.

[23]  Gerd Vandersteen,et al.  Nonparametric preprocessing in system identification: A powerful tool , 2009, 2009 European Control Conference (ECC).

[24]  Facundo Mémoli,et al.  Department of Mathematics , 1894 .

[25]  Karl J. Friston,et al.  Statistical parametric mapping , 2013 .

[26]  J. Schoukens,et al.  Estimation of nonparametric noise and FRF models for multivariable systems—Part II: Extensions, applications , 2010 .

[27]  Rosemary A. Renaut,et al.  Multisplitting for regularized least squares with Krylov subspace recycling , 2012, Numer. Linear Algebra Appl..

[28]  Wei Shao,et al.  Parallel maximum likelihood estimator for multiple linear regression models , 2015, J. Comput. Appl. Math..

[29]  Jack Dongarra,et al.  ScaLAPACK user's guide , 1997 .