Monotonic Differentiable Sorting Networks

Differentiable sorting algorithms allow training with sorting and ranking supervision, where only the ordering or ranking of samples is known. Various methods have been proposed to address this challenge, ranging from optimal transport-based differentiable Sinkhorn sorting algorithms to making classic sorting networks differentiable. One problem of current differentiable sorting methods is that they are non-monotonic. To address this issue, we propose a novel relaxation of conditional swap operations that guarantees monotonicity in differentiable sorting networks. We introduce a family of sigmoid functions and prove that they produce differentiable sorting networks that are monotonic. Monotonicity ensures that the gradients always have the correct sign, which is an advantage in gradient-based optimization. We demonstrate that monotonic differentiable sorting networks improve upon previous differentiable sorting methods.

[1]  Christian Borgelt,et al.  Learning with Algorithmic Supervision via Continuous Relaxations , 2021, NeurIPS.

[2]  Christian Borgelt,et al.  Differentiable Sorting Networks for Scalable Sorting and Ranking Supervision , 2021, ICML.

[3]  Jakob Uszkoreit,et al.  Differentiable Patch Selection for Image Recognition , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Stefano Ermon,et al.  PiRank: Learning To Rank via Differentiable Sorting , 2020, ArXiv.

[5]  Mathieu Blondel,et al.  Fast Differentiable Sorting and Ranking , 2020, ICML.

[6]  Marco Cuturi,et al.  Differentiable Ranks and Sorting using Optimal Transport , 2019, 1905.11885.

[7]  S. Ermon,et al.  Stochastic Optimization of Sorting Networks via Continuous Relaxations , 2019, ICLR.

[8]  Scott W. Linderman,et al.  Learning Latent Permutations with Gumbel-Sinkhorn Networks , 2018, ICLR.

[9]  Graham Neubig,et al.  A Continuous Relaxation of Beam Search for End-to-end Training of Neural Sequence Models , 2017, AAAI.

[10]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[11]  Aaron C. Courville,et al.  Generative adversarial networks , 2014, Commun. ACM.

[12]  Yaroslav Bulatov,et al.  Multi-digit Number Recognition from Street View Imagery using Deep Convolutional Neural Networks , 2013, ICLR.

[13]  Marco Cuturi,et al.  Sinkhorn Distances: Lightspeed Computation of Optimal Transport , 2013, NIPS.

[14]  Krzysztof C. Kiwiel,et al.  Convergence and efficiency of subgradient methods for quasiconvex minimization , 2001, Math. Program..

[15]  Donald E. Knuth,et al.  The Art of Computer Programming: Volume 3: Sorting and Searching , 1998 .

[16]  A. Habermann,et al.  Parallel Neighbor-Sort (or the Glory of the Induction Principle), , 1972 .

[17]  Kenneth E. Batcher,et al.  Sorting networks and their applications , 1968, AFIPS Spring Joint Computing Conference.

[18]  Honguk Woo,et al.  Differentiable Ranking Metric Using Relaxed Sorting for Top-K Recommendation , 2021, IEEE Access.

[19]  Hongyuan Zha,et al.  Differentiable Top-k with Optimal Transport , 2020, NeurIPS.

[20]  Andrew Y. Ng,et al.  Reading Digits in Natural Images with Unsupervised Feature Learning , 2011 .