- Most integer arithmetic and logical instructions now implemented (i.e I'm now generating optimised assembly for these instructions rather than calling a generic function to handle them
- Regsiter caching implemented (although I'm only using a greedy allocation algorithm at the moment, as I've not yet fully implemented the fast linear scan algorithm I talked about in the previous post)
- I'm directly linking all direct branches to compiled fragments
- I'm linking to all indirect branch targets
So far I'd say I'm around 40-50% through the work on the dynarec engine.
Now for some stats :) The following table compares the framerates at various points (previous framerate is for the R4 release of Daedalus, current framerate is for my most recent development build):
Scene | Previous Framerate (Hz) | Current Framerate (Hz) |
Mario Head | 3 | 6 |
Mario Main Menu | 14 | 25 |
Mario Peach Letter | 6-7 | 11 |
Mario Flyby (under bridge) | 6 | 10 |
Mario In Game | 5-6 | 9 |
Mario Kart Nintendo logo | 10 | 23 |
Mario Kart Flag | 6 | 11 |
Mario Kart Menu | 7 | 11 |
Zelda Nintendo Logo | 20 | 23 |
Zelda Start Menu | 2-3 | 4 |
Zelda Main Menu | 10 | 13 |
Overall I'd say the dynarec is currently achieving up to a 100% speedup in the roms I've tested, which I'm very excited about. Mario is certainly starting to feel a lot more playable, and the Mario Kart menus are a lot more responsive now.
I specifically included Zelda in the results because I'm not seeing the same kind of results there, so I need to take a closer look at what's going on there (it's quite possible it's just using a few of the arithmetic and logical ops I've not spent time optimising yet).
A twofold improvement in framerate is pretty good, but I now think I can do a lot better. Here's the list of things I currently have on my 'TODO' list:
- Fully implement all the remaining integer ops (including all the 64 bit instructions)
- Finalise implementation of the fast linear scan register allocation algorithm
- Keep track of 'known' values for specific registers and use this to optimise the generated code (e.g. most of the time the top half of the N64's 64 bit registers is just sign extended from the lower half)
- Cache the memory location pointer to by the N64 stack pointer (SP) and optimise load/stores using this register as a base pointer
- Optimise all memory access instructions (currently all the cached registers get flushed for all memory accesses other than LW/SW/LWC1 and SWC1)
- Detect and optimise 'busy wait' loops (e.g. many roms sit in a tight loop waiting for the next vertical blank interrupt to fire which is just wasting cycles on the PSP)
- Implement all the branching instructions (I've currently only implemented BNE, BEQ, BLEZ and BGTZ)
- Implement instructions and register caching for all the cop1 (floating point coprocessor) instructions. (I think this will give a huge speedup.)
Although the list is quite short, there's quite a lot of work there. What I'm quite excited about is that I think these changes will start to provide significant speedups as they're implemented. I don't want to get too far ahead of myself, but I'm starting to feel that certain roms are going to be very playable in the not too distant future.
I'm going to try and release a new version of the emulator soon. Unfortunately it's probably not going to be this weekend (due to various social commitments); towards the end of the following week is more likely. I'd certainly like to get a version released before the World Cup starts and all my free time is taken up watching football :)
-StrmnNrmn