Flexible High-Performance Matrix Multiply via a Self-Modifying Runtime Code