I'll go into more details in a later post, but in essence the problem was due to very rare situations where the trace recorder would exit a trace when there was still a branch delay instruction pending. This caused the fragment generator to inadvertently skip the branch instruction, causing the odd behaviour I was seeing.
For reference, here are some updated figures for Super Mario 64 and Mario Kart (initial results are from a previous post). Generally the current changes seem to indicate an overall speedup of 20%-25%, which is great for a few days work. What's even better is that I've still not implemented all the optimisations that I have planned for R7, so hopefully these numbers will look even better soon.
|Scene||R4 Framerate (Hz)||R5 Framerate (Hz)||Current Framerate (Hz)|
|Mario Main Menu||14||25||30|
|Mario Peach Letter||6-7||11||13|
|Mario Flyby (under bridge)||6||10||12|
|Mario In Game||5-6||9||11|
|Mario Kart Nintendo logo||10||23||24|
|Mario Kart Flag||6||11||13|
|Mario Kart Menu||7||11||13|
(I'll update with results from Zelda shortly - I have to go to a BBQ now!)