I didn't mean to leave it quite so long since last weekend's update, but I've been working hard on a number of optmisations for R10. Oddly enough these are mostly new issues that I've found - most of them don't exist in the list of tasks I came up with a couple of weeks ago. I think that shows how much scope there is for optimising Daedalus!
Firstly, I finally managed to get Daedalus compiling with GCC's '-O3' setting. This flag turns on all of the optimisations that GCC provides. When I've tried to enable this flag in the past I've had numerous strange crashes and odd behaviour, so all releases of Daedalus to date have been compiled with -O1.
I updated my local installation of the PSPSDK last weekend and decided to try the -O3 setting again. I was pleased to find that Daedalus ran without crashing, but there was still some odd behaviour which I eventually tracked down to my use of the famous InvSqrt function. You can read a bit more about my findings on the pspdev forums.
Enabling -O3 tends to slightly increase the code size (the EBOOT.PBP has increased from around 850KB to 900KB), but the speedup is quite noticable - my estimate is that Daedalus runs around 5% faster with -O3 over -O1.
As a result of the thread I started on the pspdev forums, hlide and Raphael both came up with some great suggestions for how I could optimise my use of the VFPU.
When I originally wrote the VFPU code for TnL and clipping there were still many undocumented/unsupported functions. A few months down the line and hlide and co have discovered a couple of instructions which are perfect for my needs - namely vuc2i and vc2i. These two functions take a 32-bit value comprising of 4 (un)signed 8-bit chars and unpack them into a vector of 4 32-bit fixed point numbers. It turns out that these instructions are perfect for converting the N64's packed colour and normal values into a format I can use in the VFPU code.
The various VFPU tweaks I've made have given Daedalus another 5% or so speedup.
The final set of changes I've been working on this week have been to do with how I handle certain blend modes. Some of the N64 blend modes are too complex for the PSP to deal with precisely, so I have a large table of 'override' blend modes which allow me to make as good an approximation of the N64 mode as possible. It turned out that looking up these blend modes was very expensive, so I've rewritten how this is handled to make it more efficient. The end result is another small speedup.
Overall these three changes give a combined 10-15% speedup on the various games I've tested, although there are roms that lie outside this range (some show an even greater speedup while others are more or less unaffected by the changes).
There's still quite a lot more in the way of optimisations that I want to get in for Daedalus R10 (mostly stuff I mentioned earlier) so hopefully these numbers will improve even further over the next couple of weeks.