Fugue for MMX [parallel programming]

When I was an undergraduate in a hardware architecture course, one of my instructors compared programming for parallel processors with writing a symphony. At the time parallel processors were largely theoretical, and there were none around for me to play with. Now MMX adds more instruments to the orchestra. This article documents some of my early experiences with programming a simple compositing routine for MMX and the lessons I've learned from it. The program deals with pixels comprised of red, green, blue, and alpha (coverage) components, and with the assumption that the RGB components have already been multiplied by their own alpha component. I implement the most common image compositing operation, the Porter-Duff over operator.

[1]  Tom Duff,et al.  Compositing digital images , 1984, SIGGRAPH.

[2]  Uri C. Weiser,et al.  Intel MMX for multimedia PCs , 1997, Commun. ACM.