Sample-then-optimize posterior sampling for Bayesian linear models

In modern machine learning it is common to train models which have an extremely high intrinsic capacity. The results obtained are often initialization dependent, are different for disparate optimizers and in some cases have no explicit regularization. This raises difficult questions about generalization [1]. A natural approach to questions of generalization is a Bayesian one. There is therefore a growing literature attempting to understand how Bayesian posterior inference could emerge from the complexity of modern practice [2, 3], even without having such a procedure as the stated goal.