Discriminative Optimization: Theory and Applications to Computer Vision

Many computer vision problems are formulated as the optimization of a cost function. This approach faces two main challenges: designing a cost function with a local optimum at an acceptable solution, and developing an efficient numerical method to search for this optimum. While designing such functions is feasible in the noiseless case, the stability and location of local optima are mostly unknown under noise, occlusion, or missing data. In practice, this can result in undesirable local optima or not having a local optimum in the expected place. On the other hand, numerical optimization algorithms in high-dimensional spaces are typically local and often rely on expensive first or second order information to guide the search. To overcome these limitations, we propose Discriminative Optimization (DO), a method that learns search directions from data without the need of a cost function. DO explicitly learns a sequence of updates in the search space that leads to stationary points that correspond to the desired solutions. We provide a formal analysis of DO and illustrate its benefits in the problem of 3D registration, camera pose estimation, and image denoising. We show that DO outperformed or matched state-of-the-art algorithms in terms of accuracy, robustness, and computational efficiency.

[1]  R. Tyrrell Rockafellar,et al.  Convex Analysis , 1970, Princeton Landmarks in Mathematics and Physics.

[2]  D. Luenberger A combined penalty function and gradient projection method for nonlinear programming , 1974 .

[3]  Adi Ben-Israel,et al.  What is invexity? , 1986, The Journal of the Australian Mathematical Society. Series B. Applied Mathematics.

[4]  Robert C. Bolles,et al.  Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography , 1981, CACM.

[5]  M. Bertero,et al.  Ill-posed problems in early vision , 1988, Proc. IEEE.

[6]  S. Karamardian,et al.  Seven kinds of monotone maps , 1990 .

[7]  Leslie Greengard,et al.  The Fast Gauss Transform , 1991, SIAM J. Sci. Comput..

[8]  A. Nagurney Network Economics: A Variational Inequality Approach , 1992 .

[9]  Paul J. Besl,et al.  A Method for Registration of 3-D Shapes , 1992, IEEE Trans. Pattern Anal. Mach. Intell..

[10]  Eric Mjolsness,et al.  New Algorithms for 2D and 3D Point Matching: Pose Estimation and Correspondence , 1998, NIPS.

[11]  Jen-Chih Yao,et al.  Multi-valued variational inequalities with K-pseudomonotone operators , 1994 .

[12]  Luiz A. Costa,et al.  Determining the similarity of deformable shapes , 1995, Vision Research.

[13]  G. Romano New results in subdifferential calculus with applications to convex optimization , 1995 .

[14]  Timothy F. Cootes,et al.  Active Shape Models-Their Training and Application , 1995, Comput. Vis. Image Underst..

[15]  Marc Levoy,et al.  A volumetric method for building complex models from range images , 1996, SIGGRAPH.

[16]  Richard I. Hartley,et al.  In Defense of the Eight-Point Algorithm , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[17]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[18]  Timothy F. Cootes,et al.  Active Appearance Models , 1998, ECCV.

[19]  Peter L. Bartlett,et al.  Boosting Algorithms as Gradient Descent , 1999, NIPS.

[20]  Charles V. Stewart,et al.  Robust Parameter Estimation in Computer Vision , 1999, SIAM Rev..

[21]  Zhengyou Zhang,et al.  A Flexible New Technique for Camera Calibration , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[22]  Andrew Y. Ng,et al.  Pharmacokinetics of a novel formulation of ivermectin after administration to goats , 2000, ICML.

[23]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[24]  Marc Levoy,et al.  Efficient variants of the ICP algorithm , 2001, Proceedings Third International Conference on 3-D Digital Imaging and Modeling.

[25]  Bernhard P. Wrobel,et al.  Multiple View Geometry in Computer Vision , 2001 .

[26]  Andrew W. Fitzgibbon,et al.  Robust Registration of 2D and 3D Point Sets , 2003, BMVC.

[27]  Shai Avidan,et al.  Support Vector Tracking , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[28]  Andrew Zisserman,et al.  New Techniques for Automated Architectural Reconstruction from Photographs , 2002, ECCV.

[29]  Chih-Jen Lin,et al.  Training v-Support Vector Regression: Theory and Algorithms , 2002, Neural Computation.

[30]  Michel Dhome,et al.  Hyperplane Approximation for Template Matching , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[31]  Andrew Zisserman,et al.  Model Selection for Automated Architectural Reconstruction from Multiple Views , 2002 .

[32]  Azriel Rosenfeld,et al.  Robust regression methods for computer vision: A review , 1991, International Journal of Computer Vision.

[33]  Simon Baker,et al.  Lucas-Kanade 20 Years On: A Unifying Framework , 2004, International Journal of Computer Vision.

[34]  B. Hall Lie Groups, Lie Algebras, and Representations: An Elementary Introduction , 2004 .

[35]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[36]  Takeo Kanade,et al.  A Correlation-Based Approach to Robust Point Set Registration , 2004, ECCV.

[37]  S. Shankar Sastry,et al.  An Invitation to 3-D Vision , 2004 .

[38]  Thomas G. Dietterich,et al.  Training conditional random fields via gradient tree boosting , 2004, ICML.

[39]  Jaime Ortegón-Aguilar,et al.  Lie algebra template tracking , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[40]  Michael J. Black,et al.  On the unification of line processes, outlier rejection, and robust statistics with applications in early vision , 1996, International Journal of Computer Vision.

[41]  Yurii Nesterov,et al.  Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.

[42]  Takeo Kanade,et al.  Shape and motion from image streams under orthography: a factorization method , 1992, International Journal of Computer Vision.

[43]  Takeo Kanade,et al.  A Multibody Factorization Method for Independently Moving Objects , 1998, International Journal of Computer Vision.

[44]  Takeo Kanade,et al.  Robust L/sub 1/ norm factorization in the presence of outliers and missing data by alternative convex programming , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[45]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[46]  Ira Kemelmacher-Shlizerman,et al.  Photometric Stereo with General, Unknown Lighting , 2006, International Journal of Computer Vision.

[47]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[48]  Andy M. Yip,et al.  Total Variation Image Restoration: Overview and Recent Developments , 2006, Handbook of Mathematical Models in Computer Vision.

[49]  Roland Göcke,et al.  Iterative Error Bound Minimisation for AAM Alignment , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[50]  Eduardo Bayro-Corrochano,et al.  Lie algebra approach for tracking and 3D motion estimation using monocular vision , 2007, Image Vis. Comput..

[51]  Stephen P. Boyd,et al.  Subgradient Methods , 2007 .

[52]  Timothy F. Cootes,et al.  Boosted Regression Active Shape Models , 2007, BMVC.

[53]  Rafael C. González,et al.  Digital image processing, 3rd Edition , 2008 .

[54]  V. Lepetit,et al.  EPnP: An Accurate O(n) Solution to the PnP Problem , 2009, International Journal of Computer Vision.

[55]  Matthew Harker,et al.  Least squares surface reconstruction from measured gradient fields , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[56]  Fatih Murat Porikli,et al.  Learning on lie groups for invariant detection and tracking , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[57]  K. Ikeuchi,et al.  Robust Simultaneous Registration of Multiple Range Images , 2008 .

[58]  Fernando De la Torre,et al.  Metric Learning for Image Alignment , 2009, International Journal of Computer Vision.

[59]  A. Chambolle,et al.  An introduction to Total Variation for Image Analysis , 2009 .

[60]  Siddhartha S. Srinivasa,et al.  Object recognition and full pose registration from a single image for robotic manipulation , 2009, 2009 IEEE International Conference on Robotics and Automation.

[61]  Jiri Matas,et al.  Tracking by an Optimal Sequence of Linear Predictors , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[62]  Siegfried Schaible,et al.  Generalized Monotone Multivalued Maps , 2009, Encyclopedia of Optimization.

[63]  Mohammed Bennamoun,et al.  On the Repeatability and Quality of Keypoints for Local Feature-based 3D Object Retrieval from Cluttered Scenes , 2009, International Journal of Computer Vision.

[64]  Xiaoming Liu,et al.  Discriminative Face Alignment , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[65]  Andriy Myronenko,et al.  Point Set Registration: Coherent Point Drift , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[66]  Pietro Perona,et al.  Cascaded pose regression , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[67]  Roland Siegwart,et al.  A novel parametrization of the perspective-three-point problem for a direct computation of absolute camera position and orientation , 2011, CVPR 2011.

[68]  Baba C. Vemuri,et al.  Robust Point Set Registration Using Gaussian Mixture Models , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[69]  Andrea Torsello,et al.  A Scale Independent Selection Process for 3D Object Recognition in Cluttered Scenes , 2013, International Journal of Computer Vision.

[70]  Shiqi Li,et al.  A Robust O(n) Solution to the Perspective-n-Point Problem , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[71]  Alessio Del Bue,et al.  Bilinear Modeling via Augmented Lagrange Multipliers (BALM) , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[72]  Jian Sun,et al.  Face Alignment by Explicit Shape Regression , 2012, International Journal of Computer Vision.

[73]  Shahram Izadi,et al.  Modeling Kinect Sensor Noise for Improved 3D Reconstruction and Tracking , 2012, 2012 Second International Conference on 3D Imaging, Modeling, Processing, Visualization & Transmission.

[74]  Enhong Chen,et al.  Image Denoising and Inpainting with Deep Neural Networks , 2012, NIPS.

[75]  Roland Siegwart,et al.  Challenging data sets for point cloud registration algorithms , 2012, Int. J. Robotics Res..

[76]  Alexandre Bernardino,et al.  Unifying Nuclear Norm and Bilinear Factorization Approaches for Low-Rank Matrix Decomposition , 2013, 2013 IEEE International Conference on Computer Vision.

[77]  Fernando De la Torre,et al.  Supervised Descent Method and Its Applications to Face Alignment , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[78]  Y. Nesterov,et al.  First-order methods with inexact oracle: the strongly convex case , 2013 .

[79]  Thomas Brox,et al.  An Iterated L1 Algorithm for Non-smooth Non-convex Optimization in Computer Vision , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[80]  Xiaogang Wang,et al.  Deep Convolutional Network Cascade for Facial Point Detection , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[81]  Yubin Kuang,et al.  Revisiting the PnP Problem: A Fast, General and Optimal Solution , 2013, 2013 IEEE International Conference on Computer Vision.

[82]  Per Bergström,et al.  Robust registration of point sets using iteratively reweighted least squares , 2014, Computational Optimization and Applications.

[83]  Alex Graves,et al.  Recurrent Models of Visual Attention , 2014, NIPS.

[84]  Hossein Mobahi,et al.  Coarse-to-Fine Minimization of Some Common Nonconvexities , 2014, EMMCVPR.

[85]  Francesc Moreno-Noguer,et al.  Very Fast Solution to the PnP Problem with Algebraic Outlier Rejection , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[86]  Fernando De la Torre,et al.  Supervised Descent Method for Solving Nonlinear Least Squares Problems in Computer Vision , 2014, ArXiv.

[87]  Yurii Nesterov,et al.  First-order methods of smooth convex optimization with inexact oracle , 2013, Mathematical Programming.

[88]  Roland Siegwart,et al.  A Review of Point Cloud Registration Algorithms for Mobile Robotics , 2015, Found. Trends Robotics.

[89]  Lars Petersson,et al.  An Adaptive Data Representation for Robust Point-Set Registration and Merging , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[90]  J A Bagnell,et al.  An Invitation to Imitation , 2015 .

[91]  Sébastien Bubeck,et al.  Convex Optimization: Algorithms and Complexity , 2014, Found. Trends Mach. Learn..

[92]  Bernard Ghanem,et al.  ℓ0TV: A new method for image restoration in the presence of impulse noise , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[93]  Fernando De la Torre,et al.  Global supervised descent method , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[94]  Roberto Cipolla,et al.  PoseNet: A Convolutional Network for Real-Time 6-DOF Camera Relocalization , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[95]  Yuandong Tian,et al.  Theory and Practice of Hierarchical Data-driven Descent for Optimal Deformation Estimation , 2015, International Journal of Computer Vision.

[96]  Stephen P. Boyd,et al.  A Primer on Monotone Operator Methods , 2015 .

[97]  Georgios Tzimiropoulos,et al.  Project-Out Cascaded Regression with an application to face alignment , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[98]  Antonis A. Argyros,et al.  Towards the Automatic Definition of the Objective Function for Model-Based 3D Hand Tracking , 2015, ICMMI.

[99]  Andrew Zisserman,et al.  Spatial Transformer Networks , 2015, NIPS.

[100]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[101]  Marcin Andrychowicz,et al.  Learning to learn by gradient descent by gradient descent , 2016, NIPS.

[102]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[103]  Misha Denil,et al.  Learning to Learn for Global Optimization of Black Box Functions , 2016, ArXiv.

[104]  George Trigeorgis,et al.  Adaptive cascaded regression , 2016, 2016 IEEE International Conference on Image Processing (ICIP).

[105]  Jitendra Malik,et al.  Human Pose Estimation with Iterative Error Feedback , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[106]  Franziska Wulf,et al.  Minimization Methods For Non Differentiable Functions , 2016 .

[107]  Didier Stricker,et al.  Gravitational Approach for Point Set Registration , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[108]  Vladlen Koltun,et al.  Fast Global Registration , 2016, ECCV.

[109]  Varun Ramakrishna,et al.  Convolutional Pose Machines , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[110]  Jiaolong Yang,et al.  Go-ICP: A Globally Optimal Solution to 3D ICP Point-Set Registration , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[111]  George Trigeorgis,et al.  Mnemonic Descent Method: A Recurrent Process Applied for End-to-End Face Alignment , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[112]  Fred A. Hamprecht,et al.  Structured Regression Gradient Boosting , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[113]  Tomasz Malisiewicz,et al.  Deep Image Homography Estimation , 2016, ArXiv.

[114]  Jayakorn Vongkulbhisal,et al.  Discriminative Optimization: Theory and Applications to Point Cloud Registration , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[115]  Gabriel Goh,et al.  Why Momentum Really Works , 2017 .

[116]  Simon Lucey,et al.  Inverse Compositional Spatial Transformer Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[117]  José M. F. Moura,et al.  FCN-rLSTM: Deep Spatio-Temporal Neural Networks for Vehicle Counting in City Cameras , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[118]  Misha Denil,et al.  Learning to Learn without Gradient Descent by Gradient Descent , 2016, ICML.

[119]  Jonathan T. Barron,et al.  A More General Robust Loss Function , 2017, ArXiv.

[120]  Huu Le,et al.  An Exact Penalty Method for Locally Convergent Maximum Consensus , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[121]  José M. F. Moura,et al.  SILVar: Single Index Latent Variable Models , 2017, IEEE Transactions on Signal Processing.

[122]  Jonathan T. Barron,et al.  A General and Adaptive Robust Loss Function , 2017, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).