Wednesday, May 31, 2006

Some initial benchmarks

I've been really busy working on the new dynarec engine, so I've not been posting as frequently as I'd like. I've made a lot of progress in the following areas:

  • Most integer arithmetic and logical instructions now implemented (i.e I'm now generating optimised assembly for these instructions rather than calling a generic function to handle them

  • Regsiter caching implemented (although I'm only using a greedy allocation algorithm at the moment, as I've not yet fully implemented the fast linear scan algorithm I talked about in the previous post)

  • I'm directly linking all direct branches to compiled fragments

  • I'm linking to all indirect branch targets

So far I'd say I'm around 40-50% through the work on the dynarec engine.

Now for some stats :) The following table compares the framerates at various points (previous framerate is for the R4 release of Daedalus, current framerate is for my most recent development build):

ScenePrevious Framerate (Hz)Current Framerate (Hz)
Mario Head36
Mario Main Menu1425
Mario Peach Letter6-711
Mario Flyby (under bridge)610
Mario In Game5-69
Mario Kart Nintendo logo1023
Mario Kart Flag611
Mario Kart Menu711
Zelda Nintendo Logo2023
Zelda Start Menu2-34
Zelda Main Menu1013

Overall I'd say the dynarec is currently achieving up to a 100% speedup in the roms I've tested, which I'm very excited about. Mario is certainly starting to feel a lot more playable, and the Mario Kart menus are a lot more responsive now.

I specifically included Zelda in the results because I'm not seeing the same kind of results there, so I need to take a closer look at what's going on there (it's quite possible it's just using a few of the arithmetic and logical ops I've not spent time optimising yet).

A twofold improvement in framerate is pretty good, but I now think I can do a lot better. Here's the list of things I currently have on my 'TODO' list:

  • Fully implement all the remaining integer ops (including all the 64 bit instructions)

  • Finalise implementation of the fast linear scan register allocation algorithm

  • Keep track of 'known' values for specific registers and use this to optimise the generated code (e.g. most of the time the top half of the N64's 64 bit registers is just sign extended from the lower half)

  • Cache the memory location pointer to by the N64 stack pointer (SP) and optimise load/stores using this register as a base pointer

  • Optimise all memory access instructions (currently all the cached registers get flushed for all memory accesses other than LW/SW/LWC1 and SWC1)

  • Detect and optimise 'busy wait' loops (e.g. many roms sit in a tight loop waiting for the next vertical blank interrupt to fire which is just wasting cycles on the PSP)

  • Implement all the branching instructions (I've currently only implemented BNE, BEQ, BLEZ and BGTZ)

  • Implement instructions and register caching for all the cop1 (floating point coprocessor) instructions. (I think this will give a huge speedup.)

Although the list is quite short, there's quite a lot of work there. What I'm quite excited about is that I think these changes will start to provide significant speedups as they're implemented. I don't want to get too far ahead of myself, but I'm starting to feel that certain roms are going to be very playable in the not too distant future.

I'm going to try and release a new version of the emulator soon. Unfortunately it's probably not going to be this weekend (due to various social commitments); towards the end of the following week is more likely. I'd certainly like to get a version released before the World Cup starts and all my free time is taken up watching football :)



wally*won_kenobie said...
This comment has been removed by a blog administrator.
wally*won_kenobie said...

Rock on.

Im willing to do some beta testing for you as I do for PSmonkey.

add to ur MSN listing if you have one

wally*won_kenobie said...

Also is there going to be sound support in the end?

psp726 said...

Fantastic job! You are doing great! Keep it up. Thanx so much for all your hard work. We will never forget all your hard work you put in to this.

PSDroideka said...

I check this site pretty much every day i'm that into the next release, i just wanna know, after the world cup and performance update will things like save and GUI be implamented? I sort of assume sound will come with performance.
Keep up the good work :P

kekpsp said...

Great work, this will send shockwaves down the homebrew community, Mario64 at playable speeds, I can't wait.......Thanx :), if you need any feedback on FPS, playability, glitches and crashes I am willing to give you my full support.

kersplatty said...

woo sounds like your making some sexy progress, we'll be able to play mario 64 whilst watchin peter crouch dance like a robot, cant w8 for next release good luck!

_Psycho said...

Always interesting as always :)

I was wondering, lets say you finish your dynarec and you are around 60-90% of the original speed (like 22-27fps for mario for example instead of 30). You have any plan to finish the speedup ? Like taking some textures, cache functions and rewriting them in mips asm instead of c++ ? Would that give you an extra boost or the way you wrote your dynarec already take there of that so it would be useless to rewrite some part in mips asm ?

Anyway, I really enjoy following the technical notes. I can't wait to dig in the source code to see the changes.

Linkzie said...

This is great, nice reading (:, looking forward to the next release *can't wait*

LaMa said...

Mighty impressive update :)
Once again, very good work!

I'm looking forward to the next public release. But by all means, take your time and don't rush yourself.



Exophase said...

Question about one of your optimization ideas. How can you cache the stack pointer's memory region when you don't know at compile time if stack relative accesses will be in the same page? Or is the stack usually in unpaged memory? If the latter is the case I would assume you're referring to the hardware memory region, and not the virtual memory space (but this should always be RDRAM, right?)

PSDroideka said...

Just in case no-one reads the other post, due to all the stuff laxer3a said, it sounds like he should make a 64 emu lol

Laxer3A said...


1/ In my previous post I didnt tought that n64 MIPS instruction set was different from PSP MIPS.
(64 bit reg and so on...)

So direct code "translation" seems to be a bit harder.

2/ I actually have a lot of other projects and also very busy work.
I believe StrmnNrmn is very skilled too and does not need anybody like me. :-)
I was just makings some comment about potential implementations.

I would just hope that StrmnNrmn would try to discuss more on this blog, so we could devellop optimization ideas. That would be fun. But I bet he want to reach his goal fast without loosing too much time.(=when home == coding, not internet thingy)

PSDroideka said...

I didn't meen StrmnNrmn needed help, he's well and trully prooved that :)

StrmnNrmn said...

wally: There will be sound support, but I think speed and compatibility are more important at the moment (any audio will sound horrible until the emulator is running at close to full speed).

insert display name: Save is definitely a big priority. It shouldn't be too hard to get working, but again I don't think it's a priority until some of the compatibility and performance issues are addressed. A better GUI is definitely required too (if just to allow the controller to be reconfigured on a rom by rom basis)

_psycho: I think there is a lot of scope for optimising other parts of the emu specifically for the psp. Certainly the texture decompression etc could be heavily optimised for the PSP. At the moment the CPU emulation is taking the majority of the time, so that's what I'm focussing on. Hopefully once the dynarec work is finished it should be more obvious where to look at improving next.
PS- I'll take a look at commiting my changes on the CVS repository today if you fancy looking thrigh the code.

expophase: Usually the stack is in physical ram, so there's no issues with paging etc. Some games do (annoyingly) have the stack in virtual mem so this optimisation probably wouldn't work for them. It would have to be toggleable from the .ini file to work I think.

StrmnNrmn said...

laxer3a: Sorry I didn't get around to replying to your previous post - you raised a really interesting approach that I'd not given much thought before.

I think you spotted the same problem as I thought of- i.e. the psp has a slightly different instruction set than the n64 (64 bit instructions as you mention) The other problem is that it's big endian whereas the psp is little endian, so all 1 and 2 byte load/stores need to be fiddled to get working. I think it might be possible to get a 'direct' translator working like you suggest, but I think there would end up being a lot of hacks and special cases etc. Ultimately I think that a full dynamic translator is going to be the easiest approach (plus I can also share most of the code for the 'front end' of the translator with the PC version :)

You're right when you say that I'm trying to achieve my goal quickly without 'wasting' too much time. My job takes up quite a lot of my time, so I only get a few hours at home in the evenings during the week. It usually takes me a couple of hours to go through all my email and update the blog etc. I usually only ever end up doing it once a week so I can spend as much time developing as possible, but I'm aware that it's important to keep people informed as to what's going on. I also enjoy talking to you guys so I'll try and squeeze in a few smaller updates when I get the chance :)

Exophase said...

Other N64 emulators have used byteswapping before, assumedely to address the endian issue, although I'm not sure how this actually helped anything since reading/writing bytes and halfwords would provide an inconsistant view of memory. I've used byteswapping but it was on a "platform" that only supported full word memory accesses.

I don't know if you're already using these or not, but for manual byteswapping MIPS32r2 processors have some additional instructions that should help, a two instruction sequence can byteswap a full word, it's not as good as what's available on PPC but it's a lot better than doing it the traditional way (the following is taken from the programmers manual PDF):

lw t0, 0(a1) /* Read word value */
wsbh t0, t0 /* Convert endiannes of the halfwords */
rotr t0, t0, 16 /* Swap the halfwords within the words */

Mikeyd said...

hey id love do to some testing for you, emails is if u wanna contact me about it, this emu is proabably my second most used homebrew, keep it up! :)

_Psycho said...

You know I just realised you were on sourceforge and that you were using the CVS, I thought you were only realising the source code with every release ;) Good Idea there, I give it a look later this week.

Mikeyd, if you really want to test badly, get the lastest source code in the cvs, compile it and check the result ;)

BigMace said...

this great news keep up the work.

I think it would really add to the emulator if you added a gui similar to the nesterj or older snes9xtyl and not some simple one. Oh and it wwuld nice if you added the option to let people choose their own rom path.

Also if you could at least add the 4k eeprom saving in the next release because it sounds like mario 64 is getting to playable state and it would be fun if we didnt have to restart every time. Maybe psmonkey could help you with this because he has 4k eeprom saving on his emu.

thanks for you work,

Kramer said...

Great news
keep up the good work man cant wait for r5
i was just wondering if more roms were going to be supported in this release and if the textures are going to be fixed.

PSDroideka said...

Thanks for the info, i look forward to an exeptional release ;)

wally*won_kenobie said...

Have you tried Quest 64 yet?

That seems to be the fastest game on Monkey64

Would like to see benchmarks

Laxer3A said...

StrmnNrmn, thanks for your answer.

Basically the problem when you translate ONE n64 mips instruction it result in MULTIPLE PSP mips instruction, with a lot of different subcases.

The "best" way actually would be to formalize the N64 instruction into a graph as used in compiler. Basically roll back the ASM instruction into chunk of "virtual" micro instruction...

Once the chunk of code has been completly "virtualized" you pass all your graphs trees through various optimizing filter. Which will reduce the size of the tree or make them more efficient to become mips instruction on the target device.

I know it is a bit of overkill (thats why I put best between "").
Thats how compiler do their job, but we definitely agree that it is costly for real time stuff and limited cpu platform.

MIPS -> MIPS still is close enough to do that in a more simple way.
Some "special cases" handled in a nice way can probably do as good at a cheaper cost.

Anyway there is always hundreds of way to solve problems, depending on trade-off so...

The problem doing this on the Snes emu is that the architecture is SO different, that CISC rich adressing mode on each instruction is just a real pain, getter/setter code generation annoying and finally need to detect if the code is in RAM or ROM to avoid selfmodifying code issue.

In the case of TYL, it isnt worth it.(dev time vs benefit)
In your case, it is definitely a requirement.

If you translate your GPU call quite fast, the next bottleneck is audio and cpu... Definitely worth it then.

Anyway, if you have time, drop a mail :
I really enjoy discussing about this kind of stuff.

flyinghippo said...

I'm very pleased to see progress in this emulator. Creating an N64 emulator used to seem very hard to do, and I'm sure it still is hard work, but you're still doing your best to make this possible. Remember, always work at your own pace, just so you don't get caught up in too much work. Keep up the good work, I can't wait to see more progress.

StrmnNrmn said...

_psycho: Just to let you know I've updated the CVS repository with all my recent changes- I'll post a small new entry about this so it's a bit more visible.

bigmace: 4k/16k eeprom support should only take a few minutes to get working - most of the logic is all there from Daedalus PC. I just need to sit down for 10 minutes and hook up the load/save to memorystick on the PSP. I'm holding off for a short while as I want to double check the 'fileformat' is compatible with various other emulators (i.e. so people can download and share their saves)

kramer: I will fix a few of the more obvious glitches that I've come across. R5 is primarily going to focus on perfomance though, so there's unlikely to be much in the way of graphics or compatibility fixes (maybe I'll spend a week concentrating on this for a quick R6 release).

wally: It does run very fast. The problem is there are some nasty graphical glitches which make it impossible to see what's going on when you get in game. I'll take a look at fixing this soon :)

laxer3a: Sounds like a really interesting idea. In the past I've had a thought about treating the n64 asm as an arbitrary program fragment, converting it into SSA form and then applying various optimisations from there (e.g. lots of peephole optimisations should become very easy at this point). Obviously if you did this you'd have to make sure that the overhead of your optimiser didn't slow everything down too much though. Will drop you a line sometime this week - would be good to chat about this in a bit more depth.

vettacossx said...

so approxamatley when can we expect r5 since the stat comparison shows its more than NEW VERSION status ;) thanks just anxious :)