Sunday, June 8, 2008

Counter-intuitive optimisation

I've always been interested in game programming. I never actually managed to create a full, completed game, but from time to time I do start another project. Currently I'm messing around with the XNA Game Studio 3.0 CTP. When Googling for some answers on some game-programming-related problem, I ran into a developers journal on The journal was pretty interesting and I started reading all the entries. It's about a developer who's creating a space MMO. This particular developer mainly focuses on the graphics engine, so while all the specific graphics lingo and math is lost to me, I do understand the more general programming bits and one bit really surprised me.

On November 7, 2007 he wrote a piece about optimising a noise function. His project uses a noise function coded in a high level programming language and it appeared too slow. So this developer decided, like many other developers would, to rewrite the function in low-level assembly to gain performance. This kind of manual optimisation is very hard, but generally yields pretty good results. But when he hand-optimised the function it wasn't faster. In fact it was 2 times slower then the implementation written in the high-level programming language. Programming in assembly language is hard. It's very low-level and you need a really good understanding of how the CPU architecture works to get the best out of it. So it's not surprising that a compiler written by a lot of very smart people will produce faster code then a single, simple developer can product on his own. But this didn't appear to be the problem. No, the problem was using a lookup table.

A lookup-table is a very simple (and usually very effective) optimisation technique. If the calculations are slow, but limited in volume, it's easy to just calculate all the possible values one time and store it so that later calculations don't need to be calculated, but can just be looked up. In a lot of game-programming books I read this was usually the first optimisation authors would recommend. It's easy and it just works. Not anymore it seems. After the developer had reprogrammed the noise function in assembly language the bottleneck didn't appear to be the calculations, but the memory accessing in the lookup-table. So by eliminating the lookup-table and just calculating each thing every time, he made the function faster. And that is not something I would have though of.

The funny thing is, this is actually something Steve McConnell warns about in Code Complete. In this excellent book he warns about optimising performance by 'guessing' what's faster. Things that 'feel' like they would be faster aren't necessarily faster. And things that where faster last year, won't necessarily be faster on this year's CPUs, compilers and operating systems. The only one true way to determine what is faster and what is not, is by measuring.

No comments: