Wednesday, August 16, 2006

Unexpected optmisations

One of the things that I find most rewarding about programming is when you discover an unexpected improvement or optimisation by accident. You can spend weeks carefully tuning and optimising code, only to stumble across a glaring inefficiency in your code which you've never spotted before. One quick change and your application is suddenly noticably faster.

In my daily job I rely heavily on debuggers and profilers to discover bottlenecks in the working on the Xbox, Microsoft provided some excellent performance analysis tools (I see they've finally released PIX for Windows). These days I tend to use AQtime as I'm PC based (it's also one of the few profilers I've found that can handle the size of our libraries at work without grinding to a shuddering halt.)

Without these kind of tools it's a lot tougher profiling on the PSP. Over the past few months I've built a number of custom profiling tools into Daedalus to help me figure out where all the time is going, but the numbers I get out tend to be quite vague, and there's usually quite a large margin of error. I think this explains why the unexpected optimisation I've just found went undiscovered for so long.

A couple of days ago I was browsing the ps2dev forums and came across this post. I was about to back out after a quick scan, when I noticed this comment from Soatome:


PeterM wrote:
but one waits for the vblank

...and that's sceCtrlReadBufferPositive (which you're using)
you should use sceCtrlPeekBufferPositive instead.


That's when I realised that when Daedalus was emulating a rom, it was stalling for a frame every time the rom read the status of the pad*. In other words by changing one line of code in Daedalus from


sceCtrlReadBufferPositive


to


sceCtrlPeekBufferPositive


I could get on average an instant 1fps speedup across all roms. What's more, I knew some roms read from the pad multiple times each frame, so they would see an even great speedup.

Frustratingly I had to wait a couple of days before I could try this out. As I mentioned earlier I'm in the process up moving over to a new PC, and I had just moved Perforce over but hadn't set up the pspsdk, which required Cygwin. Daedalus requires libpng and zlib so I had to download and build them too. Then I had to set up Psplink, PuTTY and a whole host of other tools. You get the picture...

Last night I finally managed to get a new build together with the updated code, and the results were every bit as good as I'd expected. In some cases I had to restart the rom just to make sure I wasn't mistaken. I know most of you just want to see some numbers, so here's a few of my observations:

Mario now runs at at steady 15fps in most places, and around 20fps indoors etc (it reaches over 35fps in the main menu, and close to 30 in some scenes.) Zelda now runs at around 8fps in game, and up to 20fps in certain places. The 'nintendo' logo at the start runs at over 90fps :D The MarioKart Nintendo logo now runs at 30fps, and the main menu (with the flag) runs at a solid 15fps. In game it's a comfortable 12fps. Starfox runs at around 15fps - the intro runs at 25-30fps. Quest64 runs at 20fps.

So all in all it's a pretty amazing improvement for a single-line change. Having said that, I think it would be a mistake to assume that this is an instant fix that will suddenly make everything fully-playable. Although some of the framerates I list above are excellent - faster than an native n64 even - not all roms show this improvement. Don't assume that all roms now run at 15+fps (because they don't.) There's still a lot more work to do to get from a sluggish 8fps to a more playable 15fps (in Zelda for instance). I still need to save a lot more cycles in order to support other features such as sound.

Because this change makes such a big improvement I'm going to try and get another release out sooner rather than later. I don't like releasing builds too often as I think each revision should something worthwhile, but I think this qualifies :) There are a couple of other optimisations I want to get in this build, so while it might be ready this weekend, sometime early next week is more likely. The new features I had planned for this build will have to wait until R9.

As always, I'll keep you posted.

-StrmnNrmn

*This actually reminds me of a funny story from one of the Xbox games I was working on. We were investigating a sudden slowdown that had been appeared a few days previously. Somehow I realised that the framerate doubled when you unplugged all the controllers. As it turned out someone was accidentally reinitialising the USB hub every frame, and removing the all controllers prevented this from happening.

56 comments:

skater9269 said...

awesome job strmn you rule i cant wait btw do you know why the mario kart in time trials especially does the lag because i have found that it is aparently linked to either the joypad or turning in general.

Mario Kart God said...

This is great news strmnnrmn!!! Keep up the good work, but why does Mario Kart in R7 all of the sudden lag when it never used to not in R6?

skater9269 said...

if this applies to the joystick and not just the pad* then i believe it will help mario kart out because look at my above comment that bug is related to the stick or turning also i think that it slightly effects mario 64 and over all frame rate maybe.

Urkel said...

when do u think it will be ready for release?

skater9269 said...

urkel he says maybe this weekend maybe early this week

Disturbd1 said...

" if this applies to the joystick and not just the pad* then i believe it will help mario kart out because look at my above comment that bug is related to the stick or turning also i think that it slightly effects mario 64 and over all frame rate maybe."

Yes.... it will apply to the joystick... just cuz it says pad it doesn't only mean the d-pad or just the face pad

skater9269 said...

disturbed i was just checking i assumed he meant both.

Disturbd1 said...

strmn: How we go about editing and compiling the source ourselves to incorporate the changes? I don't know how C works, as I'm only familiar with LUA.... Help any1?

Disturbd1 said...

1 more question: Would I need Cygwin if i'm alrdy running Linux? I'm currently dualbooting XP and Ubuntu....

Urkel said...

njismyhome, the only reason that would be a flat out no. Is because it depends, psmonkey is a great coder, but it working on almost 6 projects at once and is working hard the the summer coding constest.

But I want to know where this line of cose, is as I want to try this optimization for my self

Paulnpoosy said...

Great newz StrmnNrmn
How about the expasion pack? and your paypal account so I can donate?
Sorry for bugging you bro there just sum small BUT Needy sugestions!
( O ) ( O )

Paulnpoosy said...

Also that is a very intersting styory about your Xbox crises!lol
good luck bro!

JOshISPoser said...

Is there a way someone can edit the data.psp and put the code in there. All I need is a way to edit the data.psp but I can't. Anyways, i have the tools to rebuild it, I just need the edited file. Thank you in advance and great job Strmn.

PSdonkey said...

Hmm, StrmnNrmn, have you been checking your emails lately? The reason I ask is because the line of code that you have changed in your source to get better framerate is exactly what I had sent you in an email with a couple of other optimization tips a couple of weeks ago. I also sent you some code for a user to return back to the main menu from in-game play. I know you probably have been very busy with your work and haven't had time to check your emails but if you check the email I sent you, you will clearly see that I sent you that chunk of code for you to change and also a couple of other changes for the core of your emulator. I was going to implement these changes I sent you in the email in a "pre-R8" build if you didn't have the time to implement the new optimizations I sent you. Regardless, good luck with your project and I hope everything comes together good in your project.

Michael Millet said...

cool!:)

JOshISPoser said...

Hey Malkste, I could compile it into an eboot if you have the data.psp. I have no way of editing the data.psp and know what I am doing but I do know how to compile it though into a working eboot.

Tinnus said...

Funny, I just did that yesterday with the code from R7, but couldn't notice any speedups heh. That's why I thought I had older code, that with this optimization ended up as fast as R7.

Tinnus said...

BTW, I have an idea for speeding things up by a huge lot. Tell the ME of the PSP to translate the code blocks (in the dynarec) while the main CPU executes them.

That is, translate the next block of code BEFORE you get to the part where you'd be supposed to translate it. I think you can do something like...

Translating a block->found branch->stop translating

That's the current way, right? You could do...

(all in the ME)

Translating a block->found branch/cycle limit->store branch address/next PC->stop translating->update the TC (uncached) with the info on the recently-translated code->Translate the next block (PC got from the last branch instruction/last instruction when you saved up there)->keep doing that forever

Main CPU:

want to run PC=xxx->Is it compiled?

Yes: run it

No: tell the ME to translate that block of code as soon as it finished translating the current one, then wait for it to finish. The ME then changes its code translating flow accordingly to the newly-sent code block. In any case waiting for the ME to finish translating that block should be faster than doing the translation itself in the main CPU.

I have this idea for implementing a Dynarec with multiple CPUs, but I'm not sure how well (or fast) it would work since I've not actually tried it (yet). But in theory it should work.

Oh yeah, and remember to flush the ME cache when you finish translating a code block :) CPU communication must be uncached, etc etc. Just to remind you in the case you forget and start to have weird errors =P

JOshISPoser said...

cool emjay. nice find i'm trying out the frameskip. btw, wat would the noframeskip do? anyways, great find and hopefully this will become bigger and strnm will allow it.

Mario Kart God said...

Do I just have to switch both of the eboots for this to work?

JOshISPoser said...

no you don't. the only point of the % eboot is to redirect it to the non % eboot. you can put anything in the non one and it will boot up. i don't really notice a difference really except for the fact that when you run the framerate seems to speed up a little. i'll try the noframeskip one.

Mario Kart God said...

Mrio Kart goes much faster!!!!!!!!!! Mario Kart is now going at a steady 23 fps on the first level in Time Mode, with the frmeskip. BTW what does the no frameskip do???

Mario Kart God said...

Super Mario 64 also seems like it goes quite a bit fast!!!!!!!

JOshISPoser said...

Holy sh#t! no frameskip, mario is fast. reaching 23 fps running outside the castle. it is awesome. a little choppy though. around 20 inside when running, 10 when standing still. running is all that matters pretty much though. it is soo much faster but a little choppy but it is managable. everything is soo much faster and pulling either 10's in slower parts and 20's in faster parts. it is awesome. I'll try it with another game in a bit.

JOshISPoser said...

are you sure, i downloading from http://rapidshare.de/files/29776529/daedalus_nofs.zip.html and it is awesomely fast. mario kart is going pretty much full speed for me. probably no more than a tenth of a second off. i think maybe i did download the wrong program. hmmm.... my downloads show me that i last downloaded the no frameskip. maybe i just got lucky with the files????

JOshISPoser said...

yes, i tried both the files. anyways, i'm probably wrong but maybe he can try putting sound in it. i'm trying to rush him, maybe no sync in the sound but sound would be nice.

skater9269 said...

wow play rampage world tour it is crazy fast almost to much and you barley notice the frameskiping.

Mario Kart God said...

Mario Kart 30fps average with the no frameskip version, witch is actually the frameskip version!

JOshISPoser said...

yeah, rampage was so fast before, it is just amazing now.

JOshISPoser said...

which link are you using. the first or second post. the second post has the working links.

JOshISPoser said...
This comment has been removed by a blog administrator.
JOshISPoser said...

just an fyi, tony hawk 2 is a bit faster but still at a just-playable rate. anywho, this build is great and i can't wait for the official one. go strnm! also, wat would you say is the best game right now, except for like the small ones like bust-a-move, rampage, etc.? I would almost say Mario cart because it seems to be going at a great speed.

skater9269 said...

who is using the frameskip and who is using the not?

Tinnus said...

Version with fixed controls function calls (the optimization he said in the post) runs faster than that frameskip one in Mario 64. I did the changes myself and tried it.

skater9269 said...

tinnus the link for both the frameskip and not frameskip both have that fixed so what are you talking about.

Zodionic said...

OMG i just had a great idea for mario kart fans, why dont we do the same line change to R6 and then we will make mario kart playable and ad a few fps, i think mthis is a great idea, if anyone knows how to do this please do so, and post it here Plz, thx

Tinnus said...

skater9269: I'm saying that a no frameskip version, with the fixes, built by myself, runs faster than the frameskip one built by whoever. Mario 64 runs fullspeed in some occasions and very close most of the time in the castle. And mine shows the white BG right =P

Oh, and please learn to use punctuation.

kersplatty: You know what, you can tell explorer to search files for specific content.

JOshISPoser said...

i think strnm needs to post to end everybody's confusion on everything. on frameskip, no frameskip, doing everything yourself, etc.

Tinnus said...

http://www.uploading.com/pt/?get=VOPIWQLC

Set Tesselate Tris to off...

Disturbd1 said...

Tinnus's version does seem to run faster than the frameskip 1 and non frameskip 1 posted by emjay

Paulnpoosy said...

I think this is a bad idea releasing this without Norms concent!
I will be any part of this unoficial release!
Its pretty sad that you cant wait only 4 maybe five days!
you guys should be ashamed of yourselves!
not trying to be an azz or anything but il wait fot StrmnNrms release!
cuz its his emu!

Morgan said...

Well I'm back guys I just got back in today and I'm looking at catching up on all that is new. StrmnNrmn I haven't tried out R7 yet but I'm downloading it now! R8 looks great, I got on the internet on my psp in the Nassau (bahamas) airport and read the latest updates, everything looks very promising! Well I'm back and I think that R8 will be very nice with an ingame menu. And to those who were wondering I did have a great time as usual! Only downside is I got a couple of blisters on my back from that sun, I'm peeling some also.

Urkel said...

well good to have you back morgan.

lol, peeling skin is fun :D

Morgan said...

Yeah I was going to stick with R7 but Tinnus's version seems to lag less in Mario Kart so I'm using that. I will switch to R8 though when it becomes available.

Paulnpoosy said...

One more time!
StrmnNrmn could you emplent The expansion pack s it works?
and so that zelda OOT saves please and last get a paypal butten here so I can donate because im really happy because you have made Zelda on a hendheld so I can play it!
Thank you for you time!( O ) ( O )

JOshISPoser said...

imtiaz, you're right. strnm, you need to post soon because exactly wat imtiaz said. also, just to make sure you're still alive

skater9269 said...

why would he get angry if it is open source he should definatly not be but either way if tinnus did somthing to daedalus he should be required to release the source right.

Tinnus said...

ryanmwolfe: It's Tinnus, not Tidus, please. Dunno why there are so many FFX addicts who think I call myself Tidus...

psdroideka: I didn't change anything else, I think. Maybe the dude who did the other unnoficial release screwed somewhere and didn't get all the possible speedup. I actually *did* change another part of the code but according to Windows' search function that code was unused :)

tsumaru: I'm the Tinnus from the yoyofr boards, and LJP, and all, but the 'T' in TYL doesn't stand for Tinnus, it's for "Thunderz" :)

ryanwolfe (pt. 2): the FPS counter is treated differently. In Project 64, the FPS actually means "Vsync's per second" and not actual rendered frames. In 5th-gen consoles, games didn't run at full VSync rate, but a smaller multiple like 30, 20, or 15. So fullspeed Mario 64 is 60FPS in Pr.64 and 20FPS (I think) in Daedalus :)

Also, I'm studying Daedalus' code a bit lately, and any changes that I do that I think are worth something, I'll tell strmnnrmn about them.

skater9269 said...

tinus do you know why mario kart is lagging exspecially when in time trials and have you told strmn about or maybe he found out himself because i really want him to fix that.

skater9269 said...

sorry about the typo my bad i meant tinnus

JOshISPoser said...

wat's tinnus anyways? i'm not big into video games, i just like to play them. I play without studying them.

Exophase said...

tinnus: I don't think that your ideas for doing translation in the other CPU will yield huge gains because the translation overhead is usually very small compared to the execution overhead(unless extreme optimizations are done, which aren't here.. yet anyway). In my emulator (gpSP, GBA emulator for PSP) I've seen hundreds of kb of code get translated in one go without so much putting a dent in the frame rate. Plus Daedalus has a hot spot style of recompiler so it only bothers compiling code that has already been encountered a lot.

Think about it though, translation caches for these emulators on PSP tend to be around 1-3MB, there's not a lot of processing overhead involved with pushing that much data around.

Disturbd1 said...

exophase:
Correct me if im wrong, but that (the cache) is similar to the process NJ used for the CPS2 emulator to see such a huge speed gain, right?

Morgan said...

Chaz I would like to see your menu and icon pics you made but the link isn't working. It says file not found, and I know the last word in the URL is Menu. Please could you post a fixed link. Thanks

Morgan said...

Thanks for posting a fixed link, they're cool but I think I'll stick to PSDonkey's backrounds. I use the pochi icon for selecting Daedalus though.

Tinnus said...

skater9269: I don't like Mario Kart, I won't look into it because you asked, and I won't look into anything anyone asks for. Just to prevent future problems.

joshisposer: Well, it's a name I created, nothing to do with videogames. Can't I call myself whatever I want? :P