Tuesday, March 20, 2007

A look inside Daedalus

(This is quite technical post, probably only of interest to other C++/PSP developers.)

Since I fixed the issue stopping the Expansion Pak support from working, I've been testing daedalus with the feature enabled to see if the added pressure on available RAM causes any new issues. Most roms that I've been tested have been running perfectly, but I've been experiencing occasional crashes through running out of memory. I believe this is due to a small leak somewhere, so in an effort to track it down I've been improving the tools I use to track memory usage.

In the improved tracker I override the global new and delete operators, which lets me perform some logging on every allocation/deallocation made during the course of executing the emulator (this isn't quite true, as I don't currently log calls to malloc etc, but it's good enough for my purposes). At the start of each overriden implementation, I keep track of the calling function's return address with the following snippet of code:


u32 ra;
asm volatile
(
"sw $ra, %0\n"
: "+m"(ra) : : "memory"
);


I log this return address along with the allocation size and the returned pointer for every call to new and new[]. For calls to delete and delete[], I just log the address of the memory being freed and the caller's address. This data is all logged to disk across the USB connection using PSPLink.

The logfiles look a little like this:


Allocating 36 bytes - 09c40620 - RA is 08953a94
Allocating[] 8192 bytes - 09c40a00 - RA is 0895e8bc
Allocating 12 bytes - 09c3fce0 - RA is 08964898
Allocating 12 bytes - 09c40970 - RA is 089648b0
Allocating 12 bytes - 09c409d0 - RA is 089648c8
Allocating 12 bytes - 09c408e0 - RA is 089648e0
Allocating 20 bytes - 091ceb20 - RA is 089645e8
Freeing 09c3fce0 - RA is 08962374
Freeing 09c40970 - RA is 089621fc


In order to analyse the results I've written a PC tool which scans through the logfile one line at a time 'replaying' the allocations and deallocations in the same order that they occured on the PSP. The analyser keeps track of the current state of allocations at any point in time, matching up any calls to delete with the corresponding call to new. This means that at any given time the tool has a complete list of all outstanding allocations.

I can discover any memory leaks by shutting down the emulator and continuing to record the logfile while it frees up all allocated resources. By replaying the logfile the analyser can identify any leaks, as these are (mostly) the only remaining allocations at the end of the logfile. I can then run the corresponding return addresses through psp-addr2line to discover where the leaked memory is being allocated from.

The other cool feature of the tool is that it builds up a graphical representation of the state of memory allocations at any point in time. This is really useful for figuring out where all the available RAM is going.

Here's a picture showing where most of the PSP's memory is being used while emulating Mario 64 (I've added the labels on by hand, the tool doesn't do that :)



Each pixel corresponds to 16 bytes of RAM. The smallest blocks are 64x64x16 bytes = 64KB. 1MB chunks are formed from 4x4 64KB blocks. The black space corresponds to both unallocated memory, and also memory allocated outside of the tracker (e.g. calls to malloc, memory used by PSPLink, the CRT, static data areas etc.) You can see that the "Emulated RAM" accounts for just over 8MB - this is because I enabled the Expansion Pak while testing. Dynarec currently uses about 6MB - Ideally I'd like to reduce this down to around 4MB soon.

You'll notice that almost all the memory Daedalus uses is for these big fixed-size allocations. What little dynamic allocation it does at runtime is limited to:


  • Keeping track of hot-trace hit counts in the Dynamo implementation
  • Textures and texture caching


As it turns out, the out-of-memory issues I've been having have been due to the texture cache going crazy and chewing up around 3MB of memory (typically it just uses 200-300KB or so). I've not figured out the root cause yet, but the tool has helped point me in the right direction.

All in all I think this is a pretty nifty utility as it stands, but I've been thinking about a few features that would make it even better:


  • Log the time and current frame alongside each allocation/deallocation. The analyser can then use this information to see how much 'churn' there is over time. Minimising this should help improve performance and reduce fragmentation.
  • Every memory allocation has a small housekeeping overhead (for alignment, keeping track of the allocation size etc). The tool could generate a histogram of allocation sizes to demonstrate how much memory is being wasted through tiny allocations, and give some indication of where pooling or freelists might help.


These are probably features for some point down the line however :)

That pretty much sums up the tool. If this code (either for the tracker or the logfile analyser) would be useful to anyone, let me know and I'll add it to Subversion alongside the rest of the Daedalus code with R10.

StrmnNrmn

7 comments:

Morgan said...

Nice alot of technical stuff but it sounds great StrmnNrmn, shows me you certainly know what your doing; not that I didn't know before this. Anyway I'm glad you are updating us frequently, it's very good to be able to read something new from you from time to time. All the luck with R10 I know it's going to be awesome!

Anonymous said...

I didn't realize you were using a cache for all roms, I thought it was only for games that couldn't fit in the PSP's memory. I guess this is the reason Mario 64 slows down everytime it has to load a new area, correct? In the future will you allow smaller (8 MB) roms to load entirely into memory? I assume this would eliminate the slowdown in instances where it has to access the cache file.

TerryLynch said...

Great work StrmnNrmn! I dont know anything about coding but i was curious when i saw the "rom cache" on the graph.
Is there any way that daedalus could use a cache system similar to that used by NJ (the coder of the cps1,cps2 & neogeo emulators on psp)?
You probably know the drill, one creates a cache file from the rom and stores it on the memstick.
Tweaking the cache system allowed him to eventually get the emulators running fullspeed (originally they were quite laggy)at the expense of using up more memstick space (which most people are willing to give).
Again, i know nothing of emulators and i'm sure if it could work you would have thought of it already! Anyway, i love the work. Best of luck!

PSdonkey said...

That's a nice tool you got there to track down memory leaks. Memory leaks can be a nasty thing to track down and pinpoint especially with a large scale code such as yours. Hopefully you will make it available through sourceforge so we can all use :)

StrmnNrmn said...

@adam

Unfortunately if you look at the memory map, there's currently very little RAM left - certainly not enough to spend another 6MB in order to cache the ROM entirely in memory :(

In any case, I think the stalling that you're seeing (on entry to a new world in SM64 etc) is almost entirely due to the dynarec engine recompiling all the new code that's being executed.

@terry

I'm unfamiliar with the caching in the cps1/cps2 emulators, but I believe these roms are encrypted so I'm guessing the caching is for decrypted data.

n64 roms are totally unencrypted so I'm not sure this technique would be applicable to Daedalus. If you've got any links handy I'd be happy (and quite interested) to check it out though :)

@psdonkey

I'll be sure to clean up the source and submit it alongside the R10 release :)

TerryLynch said...

Hi StrmnNrmn, just to follow up on your response. You're probably correct about the cps2/neogeo roms being decrypted but i'll give a link anyway if you're interested.
This is a link to a guide which also contains a link to NJ's site (in Japanese!) with source code. Hope i'm wasting too much of your time!

Anonymous said...

Disabling the expansion pak would give you an additional 4 MB to work with though, is it not possible to use this towards the cache?