Store Buffer Reduction in the Presence of Mixed-Size Accesses and Misalignment

Naive programmers believe that a multi-threaded execution of their program is some simple interleaving of steps of individual threads. To increase performance, modern Intel and AMD processors make use of store buffers, which cause unexpected behaviors that can not be explained by the simple interleaving model.