In my daily job I rely heavily on debuggers and profilers to discover bottlenecks in the working on the Xbox, Microsoft provided some excellent performance analysis tools (I see they've finally released PIX for Windows). These days I tend to use AQtime as I'm PC based (it's also one of the few profilers I've found that can handle the size of our libraries at work without grinding to a shuddering halt.)
Without these kind of tools it's a lot tougher profiling on the PSP. Over the past few months I've built a number of custom profiling tools into Daedalus to help me figure out where all the time is going, but the numbers I get out tend to be quite vague, and there's usually quite a large margin of error. I think this explains why the unexpected optimisation I've just found went undiscovered for so long.
A couple of days ago I was browsing the ps2dev forums and came across this post. I was about to back out after a quick scan, when I noticed this comment from Soatome:
but one waits for the vblank
...and that's sceCtrlReadBufferPositive (which you're using)
you should use sceCtrlPeekBufferPositive instead.
That's when I realised that when Daedalus was emulating a rom, it was stalling for a frame every time the rom read the status of the pad*. In other words by changing one line of code in Daedalus from
I could get on average an instant 1fps speedup across all roms. What's more, I knew some roms read from the pad multiple times each frame, so they would see an even great speedup.
Frustratingly I had to wait a couple of days before I could try this out. As I mentioned earlier I'm in the process up moving over to a new PC, and I had just moved Perforce over but hadn't set up the pspsdk, which required Cygwin. Daedalus requires libpng and zlib so I had to download and build them too. Then I had to set up Psplink, PuTTY and a whole host of other tools. You get the picture...
Last night I finally managed to get a new build together with the updated code, and the results were every bit as good as I'd expected. In some cases I had to restart the rom just to make sure I wasn't mistaken. I know most of you just want to see some numbers, so here's a few of my observations:
Mario now runs at at steady 15fps in most places, and around 20fps indoors etc (it reaches over 35fps in the main menu, and close to 30 in some scenes.) Zelda now runs at around 8fps in game, and up to 20fps in certain places. The 'nintendo' logo at the start runs at over 90fps :D The MarioKart Nintendo logo now runs at 30fps, and the main menu (with the flag) runs at a solid 15fps. In game it's a comfortable 12fps. Starfox runs at around 15fps - the intro runs at 25-30fps. Quest64 runs at 20fps.
So all in all it's a pretty amazing improvement for a single-line change. Having said that, I think it would be a mistake to assume that this is an instant fix that will suddenly make everything fully-playable. Although some of the framerates I list above are excellent - faster than an native n64 even - not all roms show this improvement. Don't assume that all roms now run at 15+fps (because they don't.) There's still a lot more work to do to get from a sluggish 8fps to a more playable 15fps (in Zelda for instance). I still need to save a lot more cycles in order to support other features such as sound.
Because this change makes such a big improvement I'm going to try and get another release out sooner rather than later. I don't like releasing builds too often as I think each revision should something worthwhile, but I think this qualifies :) There are a couple of other optimisations I want to get in this build, so while it might be ready this weekend, sometime early next week is more likely. The new features I had planned for this build will have to wait until R9.
As always, I'll keep you posted.
*This actually reminds me of a funny story from one of the Xbox games I was working on. We were investigating a sudden slowdown that had been appeared a few days previously. Somehow I realised that the framerate doubled when you unplugged all the controllers. As it turned out someone was accidentally reinitialising the USB hub every frame, and removing the all controllers prevented this from happening.