Optimizing C++ Vector Expressions

Recently, I've been entertaining myself by studying all the nifty new programming techniques that have been invented since I was in school. I started with C++ and object-oriented programming. I'm now progressing through generic programming, aspect-oriented programming, partial evaluation and generative programming. What I'm really interested in is whether all these tricks work in the real world of graphics programming. My answer so far is "Yes, but...". To see "but what?", I define one of the problems I want to solve in in this article. I wanted to have a programming language that defined vectors and the arithmetic operations between them. C++ allows me to do this, but there are various pitfalls in doing this well. This article addresses one of these pitfalls: the speed of execution of vector arithmetic. The conventional approach turns out to be somewhat slow, but there's a very tricky technique that can make vector arithmetic very fast. It's based on the work of T. Veldhuizen (1998) and uses the C++ template mechanism in bizarre and unexpected ways.