On obtaining sparse semantic solutions for inverse problems, control, and neural network training

Abstract Modern-day techniques for designing neural network architectures are highly reliant on trial and error, heuristics, and so-called best practices, without much rigorous justification. After choosing a network architecture, an energy function (or loss) is minimized, choosing from a wide variety of optimization and regularization methods. Given the ad-hoc nature of network architecture design, it would be useful if the optimization led to a sparse solution so that one could ascertain the importance or unimportance of various parts of the network architecture. Of course, historically, sparsity has always been a useful notion for inverse problems where researchers often prefer the L 1 norm over L 2 . Similarly for control, one often includes the control variables in the objective function in order to minimize their efforts. Motivated by the design and training of neural networks, we propose a novel column space search approach that emphasizes the data over the model, as well as a novel iterative Levenberg-Marquardt algorithm that smoothly converges to a regularized SVD as opposed to the abrupt truncation inherent to PCA. In the case of our iterative Levenberg-Marquardt algorithm, it suffices to consider only the linearized subproblem in order to verify our claims. However, the claims we make about our novel column space search approach require examining the impact of the solution method for the linearized subproblem on the fully nonlinear original problem; thus, we consider a complex real-world inverse problem (determining facial expressions from RGB images).

[1]  Song Han,et al.  Learning both Weights and Connections for Efficient Neural Network , 2015, NIPS.

[2]  Ken-ichi Anjyo,et al.  Practice and Theory of Blendshape Facial Models , 2014, Eurographics.

[3]  Mathieu Salzmann,et al.  Learning the Number of Neurons in Deep Networks , 2016, NIPS.

[4]  C. G. Broyden A Class of Methods for Solving Nonlinear Simultaneous Equations , 1965 .

[5]  Léon Bottou,et al.  Stochastic Gradient Descent Tricks , 2012, Neural Networks: Tricks of the Trade.

[6]  Ronald Fedkiw,et al.  High-Quality Face Capture Using Anatomical Muscles , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Ajmal Mian,et al.  Threat of Adversarial Attacks on Deep Learning in Computer Vision: A Survey , 2018, IEEE Access.

[8]  D. Marquardt An Algorithm for Least-Squares Estimation of Nonlinear Parameters , 1963 .

[9]  Xin Zhang,et al.  End to End Learning for Self-Driving Cars , 2016, ArXiv.

[10]  W S McCulloch,et al.  A logical calculus of the ideas immanent in nervous activity , 1990, The Philosophy of Artificial Intelligence.

[11]  R. Fletcher Practical Methods of Optimization , 1988 .

[12]  Geoffrey E. Hinton,et al.  On the importance of initialization and momentum in deep learning , 2013, ICML.

[13]  E. Fehlberg,et al.  Low-order classical Runge-Kutta formulas with stepsize control and their application to some heat transfer problems , 1969 .

[14]  H. Keselman,et al.  Backward, forward and stepwise automated subset selection algorithms: Frequency of obtaining authentic and noise variables , 1992 .

[15]  P. Tseng Convergence of a Block Coordinate Descent Method for Nondifferentiable Minimization , 2001 .

[16]  Derek Bradley,et al.  High-quality passive facial performance capture using anchor frames , 2011, ACM Trans. Graph..

[17]  C. G. Broyden Quasi-Newton methods and their application to function minimisation , 1967 .

[18]  F. L. Chernous’ko,et al.  Solution of problems of optimal control by the method of local variations , 1966 .

[19]  John Salvatier,et al.  Theano: A Python framework for fast computation of mathematical expressions , 2016, ArXiv.

[20]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[21]  Y. Nesterov A method for solving the convex programming problem with convergence rate O(1/k^2) , 1983 .

[22]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[23]  Jinxiang Chai,et al.  Leveraging motion capture and 3D scanning for high-fidelity facial performance acquisition , 2011, SIGGRAPH 2011.

[24]  Janne Heikkilä,et al.  A four-step camera calibration procedure with implicit image correction , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[25]  Mario Bertero,et al.  Introduction to Inverse Problems in Imaging , 1998 .

[26]  George Loizou,et al.  Computer vision and pattern recognition , 2007, Int. J. Comput. Math..

[27]  Dong Yu,et al.  Deep Neural Networks , 2015 .

[28]  Dmitry P. Vetrov,et al.  Structured Bayesian Pruning via Log-Normal Multiplicative Noise , 2017, NIPS.

[29]  Quoc V. Le,et al.  On optimization methods for deep learning , 2011, ICML.

[30]  Justus Thies,et al.  Supplemental Material for ” Face 2 Face : Real-time Face Capture and Reenactment of RGB Videos ” , 2016 .

[31]  Song Han,et al.  Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.

[32]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[33]  Ruben Scardovelli,et al.  Computing curvature for volume of fluid methods using machine learning , 2018, J. Comput. Phys..

[34]  Mark W. Schmidt,et al.  Coordinate Descent Converges Faster with the Gauss-Southwell Rule Than Random Selection , 2015, ICML.

[35]  Kouichi Sakurai,et al.  One Pixel Attack for Fooling Deep Neural Networks , 2017, IEEE Transactions on Evolutionary Computation.

[36]  M. Pauly,et al.  Example-based facial rigging , 2010, ACM Trans. Graph..

[37]  Eunho Yang,et al.  Trimming the $\ell_1$ Regularizer: Statistical Analysis, Optimization, and Applications to Deep Learning , 2019, ICML.

[38]  Derek Bradley,et al.  An anatomically-constrained local deformation model for monocular face capture , 2016, ACM Trans. Graph..

[39]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[40]  Yiying Tong,et al.  FaceWarehouse: A 3D Facial Expression Database for Visual Computing , 2014, IEEE Transactions on Visualization and Computer Graphics.

[41]  Yoshua Bengio,et al.  Reading checks with multilayer graph transformer networks , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[42]  Yoshua Bengio,et al.  A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[43]  Jihun Yu,et al.  Realtime facial animation with on-the-fly correctives , 2013, ACM Trans. Graph..

[44]  Michael J. Black,et al.  OpenDR: An Approximate Differentiable Renderer , 2014, ECCV.

[45]  Scott T. Rickard,et al.  Comparing Measures of Sparsity , 2008, IEEE Transactions on Information Theory.

[46]  Sihem Amer-Yahia,et al.  Relevance and ranking in online dating systems , 2010, SIGIR.

[47]  Marc'Aurelio Ranzato,et al.  Large Scale Distributed Deep Networks , 2012, NIPS.

[48]  Derek Bradley,et al.  An empirical rig for jaw animation , 2018, ACM Trans. Graph..

[49]  Manolis I. A. Lourakis,et al.  Is Levenberg-Marquardt the most efficient optimization algorithm for implementing bundle adjustment? , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[50]  Christer Sjöström,et al.  State-of-the-art report , 1997 .

[51]  Yangang Wang,et al.  Online modeling for realtime facial animation , 2013, ACM Trans. Graph..

[52]  Sebastian Ruder,et al.  An overview of gradient descent optimization algorithms , 2016, Vestnik komp'iuternykh i informatsionnykh tekhnologii.

[53]  Leonidas J. Guibas,et al.  A metric for distributions with applications to image databases , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[54]  Mark Pauly,et al.  Dynamic 3D avatar creation from hand-held video input , 2015, ACM Trans. Graph..

[55]  Ronald Fedkiw,et al.  Local Geometric Indexing of High Resolution Data for Facial Reconstruction from Sparse Markers , 2019, IEEE transactions on visualization and computer graphics.

[56]  Geoffrey E. Hinton,et al.  Acoustic Modeling Using Deep Belief Networks , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[57]  T. Chan,et al.  Level set and total variation regularization for elliptic inverse problems with discontinuous coefficients , 2004 .

[58]  Firdaus E. Udwadia,et al.  Simultaneous optimization of controlled structures , 1988 .

[59]  Fang Liu,et al.  Learning Intrinsic Sparse Structures within Long Short-term Memory , 2017, ICLR.

[60]  D. Shanno Conditioning of Quasi-Newton Methods for Function Minimization , 1970 .

[61]  Michel Barlaud,et al.  Deterministic edge-preserving regularization in computed imaging , 1997, IEEE Trans. Image Process..

[62]  J. Nocedal Updating Quasi-Newton Matrices With Limited Storage , 1980 .

[63]  Nicholas Geneva,et al.  Modeling the Dynamics of PDE Systems with Physics-Constrained Deep Auto-Regressive Networks , 2019, J. Comput. Phys..

[64]  Ronald Fedkiw,et al.  Eurographics/ Acm Siggraph Symposium on Computer Animation (2006) Simulating Speech with a Physics-based Facial Muscle Model , 2022 .

[65]  Yuting Ye,et al.  High fidelity facial animation capture and retargeting with contours , 2013, SCA '13.

[66]  Joaquin Quiñonero Candela,et al.  Counterfactual reasoning and learning systems: the example of computational advertising , 2013, J. Mach. Learn. Res..

[67]  Ronald Fedkiw,et al.  Sharp interface approaches and deep learning techniques for multiphase flows , 2019, J. Comput. Phys..

[68]  A. Jameson,et al.  Optimum Aerodynamic Design Using the Navier–Stokes Equations , 1997 .

[69]  R. Larsen Lanczos Bidiagonalization With Partial Reorthogonalization , 1998 .

[70]  Chi-Wang Shu,et al.  Shock Capturing, Level Sets and PDE Based Methods in Computer Vision and Image Processing: A Review , 2003 .

[71]  Georgios Tzimiropoulos,et al.  How Far are We from Solving the 2D & 3D Face Alignment Problem? (and a Dataset of 230,000 3D Facial Landmarks) , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[72]  Antony Jameson,et al.  Optimum aerodynamic design using the Navier-Stokes equations , 1997 .

[73]  S. Nash A survey of truncated-Newton methods , 2000 .

[74]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[75]  Shuzhong Zhang,et al.  Maximum Block Improvement and Polynomial Optimization , 2012, SIAM J. Optim..

[76]  Matthew Turk,et al.  A Morphable Model For The Synthesis Of 3D Faces , 1999, SIGGRAPH.

[77]  Jaakko Lehtinen,et al.  Differentiable Monte Carlo ray tracing through edge sampling , 2018, ACM Trans. Graph..

[78]  Fernando A. Mujica,et al.  An Empirical Evaluation of Deep Learning on Highway Driving , 2015, ArXiv.

[79]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[80]  R. Temam,et al.  On some control problems in fluid mechanics , 1990 .

[81]  Lamberto Cesari,et al.  Optimization-Theory And Applications , 1983 .

[82]  George Em Karniadakis,et al.  Adaptive activation functions accelerate convergence in deep and physics-informed neural networks , 2019, J. Comput. Phys..

[83]  Xavier Bresson,et al.  Bregmanized Nonlocal Regularization for Deconvolution and Sparse Reconstruction , 2010, SIAM J. Imaging Sci..

[84]  Thabo Beeler,et al.  High-quality single-shot capture of facial geometry , 2010, SIGGRAPH 2010.

[85]  Ronald Fedkiw,et al.  Deep Energies for Estimating Three-Dimensional Facial Pose and Expression , 2018, ArXiv.

[86]  Stanley Osher,et al.  Image Recovery via Nonlocal Operators , 2010, J. Sci. Comput..

[87]  Ronald Fedkiw,et al.  Automatic determination of facial muscle activations from sparse motion capture marker data , 2005, SIGGRAPH '05.

[88]  Daniel Thalmann,et al.  Joint-dependent local deformations for hand animation and object grasping , 1989 .

[89]  R. Fedkiw,et al.  Improved Search Strategies with Application to Estimating Facial Blendshape Parameters , 2018 .

[90]  Peng Zhang,et al.  Transformed 𝓁1 Regularization for Learning Sparse Deep Neural Networks , 2019, Neural Networks.

[91]  Lothar Reichel,et al.  Augmented Implicitly Restarted Lanczos Bidiagonalization Methods , 2005, SIAM J. Sci. Comput..

[92]  Hanan Samet,et al.  Pruning Filters for Efficient ConvNets , 2016, ICLR.

[93]  William C. Davidon,et al.  Variable Metric Method for Minimization , 1959, SIAM J. Optim..

[94]  J. Platt Sequential Minimal Optimization : A Fast Algorithm for Training Support Vector Machines , 1998 .

[95]  Justin A. Sirignano,et al.  DGM: A deep learning algorithm for solving partial differential equations , 2017, J. Comput. Phys..

[96]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[97]  Justus Thies,et al.  Face2Face: Real-Time Face Capture and Reenactment of RGB Videos , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[98]  Long Chen FINITE ELEMENT METHOD , 2013 .

[99]  Ronald Fedkiw,et al.  Fully automatic generation of anatomical face simulation models , 2015, Symposium on Computer Animation.

[100]  D. Goldfarb A family of variable-metric methods derived by variational means , 1970 .

[101]  Jonathan J. Hull,et al.  Document Recognition IV , 1997 .

[102]  Ning Qian,et al.  On the momentum term in gradient descent learning algorithms , 1999, Neural Networks.

[103]  D K Smith,et al.  Numerical Optimization , 2001, J. Oper. Res. Soc..

[104]  James Arvo,et al.  Monte Carlo Ray Tracing , 2003 .

[105]  Ronald Fedkiw,et al.  Muscle simulation for facial animation in Kong: Skull Island , 2017, SIGGRAPH Talks.

[106]  Pradeep Dubey,et al.  Faster CNNs with Direct Sparse Convolutions and Guided Pruning , 2016, ICLR.

[107]  Stephen J. Wright Coordinate descent algorithms , 2015, Mathematical Programming.

[108]  Zhiqiang Shen,et al.  Learning Efficient Convolutional Networks through Network Slimming , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[109]  J. Navarro-Pedreño Numerical Methods for Least Squares Problems , 1996 .

[110]  Thomas Vetter,et al.  A morphable model for the synthesis of 3D faces , 1999, SIGGRAPH.

[111]  Paris Perdikaris,et al.  Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations , 2019, J. Comput. Phys..

[112]  Yurii Nesterov,et al.  Efficiency of Coordinate Descent Methods on Huge-Scale Optimization Problems , 2012, SIAM J. Optim..

[113]  Xin Tong,et al.  Leveraging motion capture and 3D scanning for high-fidelity facial performance acquisition , 2011, ACM Trans. Graph..

[114]  Matthew D. Zeiler ADADELTA: An Adaptive Learning Rate Method , 2012, ArXiv.

[115]  Gert R. G. Lanckriet,et al.  Metric Learning to Rank , 2010, ICML.

[116]  Yoram Singer,et al.  Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[117]  Jorge Nocedal,et al.  Optimization Methods for Large-Scale Machine Learning , 2016, SIAM Rev..

[118]  Yiran Chen,et al.  Learning Structured Sparsity in Deep Neural Networks , 2016, NIPS.

[119]  Ronald Fedkiw,et al.  Lessons from the evolution of an anatomical facial muscle model , 2017 .

[120]  Lawrence D. Jackel,et al.  Handwritten Digit Recognition with a Back-Propagation Network , 1989, NIPS.

[121]  Angela Barbanente Skin them bones: game programming for the web generation , 1998 .

[122]  Julia Ling,et al.  Machine learning strategies for systems with invariance properties , 2016, J. Comput. Phys..

[123]  Roger Fletcher,et al.  A Rapidly Convergent Descent Method for Minimization , 1963, Comput. J..

[124]  Kenneth Levenberg A METHOD FOR THE SOLUTION OF CERTAIN NON – LINEAR PROBLEMS IN LEAST SQUARES , 1944 .

[125]  Wotao Yin,et al.  A Primer on Coordinate Descent Algorithms , 2016, 1610.00040.

[126]  Yurong Chen,et al.  Dynamic Network Surgery for Efficient DNNs , 2016, NIPS.

[127]  Paul E. Debevec,et al.  Multiview face capture using polarized spherical gradient illumination , 2011, ACM Trans. Graph..

[128]  D. Sorensen Newton's method with a model trust region modification , 1982 .

[129]  Elad Eban,et al.  MorphNet: Fast & Simple Resource-Constrained Structure Learning of Deep Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[130]  R. Fletcher,et al.  A New Approach to Variable Metric Algorithms , 1970, Comput. J..

[131]  Jihun Yu,et al.  Unconstrained realtime facial performance capture , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[132]  Ronald Fedkiw,et al.  Art-directed muscle simulation for high-end facial animation , 2016, Symposium on Computer Animation.

[133]  Lea Fleischer,et al.  Regularization of Inverse Problems , 1996 .

[134]  Joaquin Quiñonero Candela,et al.  Web-Scale Bayesian Click-Through rate Prediction for Sponsored Search Advertising in Microsoft's Bing Search Engine , 2010, ICML.

[135]  Michael T. Heath,et al.  Scientific Computing: An Introductory Survey , 1996 .

[136]  Yalchin Efendiev,et al.  Reduced-order deep learning for flow dynamics. The interplay between deep learning and model reduction , 2020, J. Comput. Phys..

[137]  Max Welling,et al.  Bayesian Compression for Deep Learning , 2017, NIPS.

[138]  Jessica G. Gaines,et al.  Variable Step Size Control in the Numerical Solution of Stochastic Differential Equations , 1997, SIAM J. Appl. Math..

[139]  R. Isaacs,et al.  Applied Mathematics , 1901, Nature.

[140]  Ronald Fedkiw,et al.  Automatic determination of facial muscle activations from sparse motion capture marker data , 2005, ACM Trans. Graph..

[141]  Y. LeCun,et al.  Learning methods for generic object recognition with invariance to pose and lighting , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[142]  Ronald Fedkiw,et al.  Coercing Machine Learning to Output Physically Accurate Results , 2020, J. Comput. Phys..

[143]  Marcus A. Magnor,et al.  Sparse localized deformation components , 2013, ACM Trans. Graph..

[144]  Jari P. Kaipio,et al.  Tikhonov regularization and prior information in electrical impedance tomography , 1998, IEEE Transactions on Medical Imaging.

[145]  Jinyan Fan,et al.  The modified Levenberg-Marquardt method for nonlinear equations with cubic convergence , 2012, Math. Comput..

[146]  Xiangyu Zhang,et al.  Channel Pruning for Accelerating Very Deep Neural Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[147]  Wei Wen,et al.  DeepHoyer: Learning Sparser Neural Network with Differentiable Scale-Invariant Sparsity Measures , 2019, ICLR.

[148]  Sandy H. Huang,et al.  Adversarial Attacks on Neural Network Policies , 2017, ICLR.