Tuesday, June 12, 2007

Tracking down the SSB Dynarec Bug

Yesterday I said I'd provide some more details about the Super Smash Bros. dynarec fix. The actual fix is fairly straightforward, but I thought the process of tracking down the issue was quite interesting and worthy of a couple of blog posts.

When I first started looking at SSB I noted that although the game ran fine without dynarec, it would always hang when trying to enter the main entry with dynarec enabled.

I've been programming professionally for around 6 years now and I can safely say that debugging dynarec bugs is one of the hardest categories of problems I've ever had to work on. For a start, because the code is generated on the fly, you don't have the luxury of source level debugging, and without spending time reverse engineering the original rom image, you don't even know what the generated dynarec code is meant to be doing. It's very much like working blindfolded.

And it gets even worse. I've fixed dynarec problems in the past which were the result of generating incorrect code for a fragment over 500 million instructions into emulation. This would be bad enough, but it can be many thousands of instructions later before this causes emulation finally diverges from the correct path. Just identifying the exact point at which the emulation starts to diverge from the correct sequence of instructions can be like finding a needle in particularly large haystack. While blindfolded :)

Over the years of trying to debug problems like these I've built up a set of tools and learned a few tricks along the way which you might find quite interesting. Although I'm going to talk about them in the context of tracking down this dynarec issue, I've found some of the techniques useful in solving other problems so you might find other ways of applying them too.

One of the first things I do when trying to identify a dynarec issue with Daedalus is to see if the problem is reproducible on the PC build of the emulator. Although it is possible to use GDB with PSPLink, I've never got this up and running and I'm much more comfortable debugging with Visual Studio. Also, working with the PC build is usually much faster than working with the PSP build (debug builds run around 10x faster on the PC, and build times are much quicker.)

Not all dynarec issues can be debugged in this way - the PSP and PC builds have different code generation back-ends (i.e. MIPS and x86 code generation respectively) so bugs in the MIPS code generation won't usually be reproducible in the PC build. The dynarec system in Daedalus shares a common frontend (trace selection and recording) between the two platforms, which means that if I can reproduce the problem on both platforms, I can narrow down the likely location of the bug to this area.

Fortunately this particular bug manifested itself in both the PC and the PSP builds, so I knew that if I fixed the bug on the PC build, it should fix the PSP build too. What I needed to find out next is what the emulator was doing differently when dynarec was enabled compared to when it was disabled.

If dynarec is running without errors, then the sequence of executed instructions should exactly match that executed with dynarec disabled. If I could log details about all the instructions executed with dynarec disabled, and again with dynarec enabled, I should be able to compare the two logs to figure out the exact point at which dynarec is going out of sync. This all relies on the fact that the emulator is totally deterministic, i.e. that running the emulator twice in succession with the same settings should give exactly the same results.

Unfortunately, for a variety of reasons my dynarec solution doesn't produce identical results to interpretation, the main reason being that for performance reasons I can only handle vertical blank and timer interrupts on the boundaries between fragments. For example, with dynarec disabled, the first vertical blank interrupt might occur exactly on the 625,000th instruction, but with dynarec enabled with might not occur until the 625,015th instruction. This means that the logs diverge at the instant the first VBL fires, and never regain synchronisation.

When I was originally developing the new dynarec system I put a lot of effort into writing a fragment simulator, the idea being that rather than executing the native assembly code for a given trace, I could keep track of the instructions making up the trace and interpret these individually instead. Theoretically fragment simulation is identical to dynarec code execution, even down to the way I handle VBLs and timer interrupts, and it's been very useful at identifying bugs in the dynarec code generation. What's particularly useful about fragment simulation however is that I can enable a setting which makes it handle interrupts exactly in the same way as the non-dynarec core, i.e. interrupts are handled precisely rather than on fragment boundaries.

Essentially Daedalus has four modes of operation:


  • Dynarec + fragment execution
  • Dynarec + fragment simulation (imprecise interrupt handling)
  • Dynarec + fragment simulation (precise interrupt handling)
  • Interpretative core


This tool is particularly powerful, because if I can ensure that dynarec+fragment execution is equivalent to dynarec+fragment simulation, and that dynarec+fragment simulation is equivalent to running the interpretative core, then I can use the transitive properties of these relations to ensure that dynarec+fragment execution is equivalent to running the interpretative core. Fragment simulation allows me to bridge the gap between these two modes of operation which would otherwise be very difficult to compare.

I think that's long enough for one post. Tomorrow I'll talk about how I used this technique to help track down the SSB dynarec bug.

-StrmnNrmn

28 comments:

gunntims0103 said...

Great stuff, i simply can't wait for the next build :)

tim said...

umm.... PLEASE RELEASE IT!!!! I love the n64 for psp in fact..... you are the only one even trying to create something as complicated as the n64 emulator on psp

john said...

Thx m8 that teaches a lot. its actually interesting to hear how this works. Oh and the big words dont scare me, so good instruction writing ;). thx again mate, lookin forward to next lesson.

Katarat said...

wow, I loved that read, thanks for the update. im scared to admit, but I actually understood everyword you said. ur finds, problems, and solutions become very interesting and is more than enough to keep me satisfied until an actual release that i can try out. :)

Proxxxy said...

Very interesting post about the whole thing making me better understand whats going on behind the scenes. Good luck finding more bugs this way ;)

Thierry said...

I can only say one thing: You are Really Good

Henry said...

Just want to say that I appreciate the awesome work you've done on this emulator and that you've inspired me to learn how to program. Is Python a good starting point? I've heard that it is.

Andrew said...

# Dynarec + fragment simulation (imprecise interrupt handling)
# Dynarec + fragment simulation (precise interrupt handling)

I do not understand how those can be equivalent(and thus fragment execution equivalent). I am thinking 'imprecise' could change the timing of the interrupt handler. Correct or incorrect?

Thanks a lot for the article, that certainly seems like a harder debugging problem than most. It seems you just need the right tools, as usual. I would think it worthwhile to eventually get GDB and psplink up and running, as with my experience with embedded systems and RTOS's leads me to believe this is the best way to debug this sort of thing(I have worked for NASA/AMES) because those certain bugs that appear on the real system which do not appear on the simulation or alternate environment.

tehmalakai said...

Very insightful, I really look up to people like you :)

peter said...

thank you for technical details, it's always very interesting reading

Kdo4971 said...

In your long and very precise explanation, you refer to a "PC build" version of Daedalus, as a key element to debug.


Maybe it's only me, but I've not seen any Daedalus on PC since a long time ago. Is that a dev-only version, or is it also downloadable, like the PSP port ?

Cueca said...

:O

I'm studying Computer science, and I think that I'll never work in a emulator... i dont know if i could do things like this.

this would be painful for anyone that tries, but a very very rewarding experience :) and so, I admire your work. :)

maybe i can try in the future...

thanks for Daedalus, the technical posts and the motivation. and also sorry for my bad english

:)

joshua said...

Good luck man, I'll be eagerly awaiting your next blog posty thingy...

mkbin said...

so does this mean that most games that like to crash in the menu will be fixed

Steambrite Supply said...

While I can't say I've actually programmed anything of any significance in the past, I still have to wonder:

Couldn't you compare the log files of the emulation with, and without the dynarec in place, and see what the code is doing differently, and just locate where the river forks, as it were?

StrmnNrmn said...

@henry. I've never spent much time with Python, but I think it would be a good starting point for a beginner because it seems to have a very active community.

StrmnNrmn said...

@andrew.

You're right, I kind of glossed over that part of my explanation (I thought the post was getting long enough as it was :)

I used the word 'equivalent' (rather than 'identical') because they do both lead to different execution (due to changes in the timing of interrupts as you mentioned).

In the case of Daedalus, the timing of interrupts is generally very imprecise anyway - for instance I make the assumption that all instructions execute in 2 cycles which isn't particularly accurate.

Because of this firing the vbl interrupt a few hundred cycles early or late doesn't really affect the execution of most roms, so 'precise' and 'imprecise' interrupt handling are essentially equivalent for my purposes.

The same assumption probably won't hold for all dynamic code generators/emulators though :)


You make a good point about GDB though - it's definitely something I need to take another look at next time I'm trying to debug a problem like this which only reproduces on the PSP.

StrmnNrmn said...

@kdo4971: I still maintain the PC version for my own testing. Given there are better n64 emus already available on the PC (e.g. pj64) I've never really thought it worthwhile releasing new versions. If there's enough demand for it (other than just curiosity) then I'll have a look at releasing PC builds alongside PSP builds.

StrmnNrmn said...

@mkbin: Maybe - it will certainly help a few other roms, but I've not done any thorough testing yet.

StrmnNrmn said...

@streambrite_supply: Wait until the next blog post and all will be revealed :)

Danny said...

Awesome stuff strmnrmn. I for one would love to see simultaneous psp and pc releases of daedalus :D i was dissapointed you didnt keep up to the pc version and it seems you have! just haven't released it. I really hope you release the pc builds as well as the psp ones as its my favourite emulator ever made :)

Richard said...

StrmnNrmn, Please read this :)

I would just like to say that I tried this emulator around R3 or R4 and I have to say I had doubts. But from all the blog updates I must say that I think you are an amazing coder and I have no doubts at all anymore that with time you can get some close-to-perfectly emulated games. OK, that's all I wanted to say :). Keep up the excellent work.

Annonymous_Coward said...

This is one of the most exciting Psp emulation projects I have ever seen. I just signed up to say thanks for all of your hard work. It is extremely interesting to read the technical aspects of the code and I look really forward to R12. Full speed N64 emulation on the Psp means that my Psp will always be by my side. Never give up please!!

James said...

StrmnNrmn,

Apologies if this is a stupid question or if you've already addressed it in an earlier post.

I assume that your dynamic recompilation is just that - i.e. dynamic. Would there be any advantage to doing some static recompilation of [some of] the original ROM at load time rather than runtime? If so that state could obviously be saved and loaded the next time instead of the original ROM.

One other point: you might find that by releasing a PC build you get more interest and (hopefully cross-platform) code contributions. I guess there more people happy writing x86/Windows code than PSP/MIPS. :-)

StrmnNrmn said...

@james: It's an interesting idea, but the dynamic recompilation is so fast there's very little benefit to be gained from caching it (last time I checked, less than 1% of the time spent in the emulator is taken up by code generation).

Accessing the memory stick is actually pretty slow, so loading 64KiB (for example) of cached code from the memory stick may well be slower than just regenerating that code on the fly.

I'll definitely give the PC build a bit more thought though :)

Samuel said...

Strmn_Nrmn, Daedalus simply rocks. And that's not just something I said, it even shook the boundaries of what my older brother thought possible, and he's a gaduate student in computer science and one of the top heads behind phpbb.

A w e s o m e w o r k.
Please keep it up.

Zeus said...

I know you've probably been bugged by people on this before , but how hard would multi-player be to implement?
More importantly, Do you think that there is enough bandwidth, and low enough lag, to allow a host-client multiplayer setup to be playable? (ie, with one psp "hosting" the game and doing the emulation, while the other(s) just receive screen-captures and send back user-input) Or do you think that distributing the computational workload would be a better approach? (at the very minimum, the audio processor shouldn't be too horrible to move to the client psp(s) )

schibo said...

Hey StrmnNrmn,

One trick I used which might help to debug:

I used to run the dynarec and the interpreter in lock-step--so for this debugger, there were 2 cores of dyna registers/memory and interpreter registers/memory.

One dyna opcode would be executed and then written out to compare memory..keeping track of the dyna register map to restore the dyna state after the interpreter op is executed. Basically you compile a bit of extra debug logic after each opcode so the states can be compared, followed by compiled code to restore the dyna state as if the debug code was never there.

When each state could is compared --dyna and interpreter, you only need to compare what is modified at each opcode.

Then when you run the game, you can pop a messagebox which will tell you exactly where the cores diverged.

I hope I said that clearly enough that this can perhaps help you in the future :).

schibo