Deciding what to optimise

Whenever I start to answer questions on the comment pages I always end up going into too much detail for a quick response and end up deciding to put up a new post instead. I hope this isn't too annoying :)

In response to Plans for R6 xiringu and ukcuf16 had a couple of interesting suggestions for performance improvements.

First up, from xiringu:

instead of working with a 300x200 screen, work with only half height 150x200 and then display an empty line every other line to get the final 300x200.

That's an interesting idea - it's a trick that's been used by demo coders for years to get a few extra fps. I'm not sure this is going to provide all that much of a speedup to Daedalus though :( The reason for this is that currently rendering only contributes a small amount to the overall cost of each frame, so even if rendering time was totally eliminated, the framerate wouldn't change much. As an example, let's take something like Zelda which currently runs at around 4 fps. At 4fps it means each frame takes 1000/4 = 250 milliseconds to render each frame, which is broken down something like this:

CPU emulation: 200 ms
Display list parsing: 40 ms
Rendering: 10 ms
Total: 200 + 40 + 10 = 250 ms (i.e. 1000/250 = 4fps)

Assuming that we could totally eliminate the rendering time, this would now look like:

CPU emulation: 200 ms
Display list parsing: 40 ms
Rendering: 0 ms (no cost)
Total: 200 + 40 = 240 ms (i.e. 1000/240 = 4.17fps)

So the very best we could hope for in this case would be a .17fps improvement in the framerate :(

ukcuf16 wrote:

Just wanted to ask if there is ever going to be frame skip in later versions :)

What ukcuf16 is suggesting is that the emulator renders one frame, then skips the next. Alternating frames like this should halve the cost of rendering, at the cost of making the framerate a little less smooth.

Again, this is an interesting idea, but I don't really see this having much impact on the framerate as things stand at the moment. Working out the potential speedup is a little more complicated, as we have to take the average time over two frames. The numbers look something like this:

Frame1 CPU emulation: 200 ms
Frame1 Display list parsing: 40 ms
Frame1 Rendering: 10 ms
Frame2 CPU emulation: 200 ms
Frame2 Display list parsing: 0 ms (skipped)
Frame2 Rendering: 0 ms (skipped)
Total: 200 + 40 + 10 + 200 = 450 ms
Average: 450 / 2 = 225 ms (i.e. 1000/225 = 4.44fps)

So even implementing a frame skip mechanism would only give a tiny 0.5fps speedup.

To take this example to its ultimate conclusion, let's assume that I could somehow eliminate the entire cost of display list parsing and rendering:

CPU emulation: 200 ms
Display list parsing: 0 ms (no cost)
Rendering: 0 ms (no cost)
Total: 200 ms (i.e. 1000/200 = 5fps)

Even if I could somehow (magically) reduce the cost of rendering to 0 milliseconds, we'd still only see a 1fps speedup. However, if I can halve the cost of CPU emulation (which is much more likely given the speedups already seen with the new dynarec engine) this is what the calculations look like:

CPU emulation: 100 ms (now twice as fast)
Display list parsing: 40 ms
Rendering: 10 ms
Total: 100 + 40 + 10 = 150 ms (i.e. 1000/150 = 6.66fps)

At the moment I feel that there are more gains to come from optimising the CPU emulation, which is why I've been concentrating on this area recently. As the cost of CPU emulation falls relative to rendering then the ideas suggested by xiringu and ukcug16 will start to become more attractive.



  1. Strmnnrmn we don't get annoyed by these updates actually we look forward to them. It's nice to read something new about the progress, keep it up! I don't know anything about coding so, but yeah what is the next big thing you could do to help the speed and smoothness increase. Lastly in the comments on the last update PSDonkey said he made some changes to the R5 source AGAIN* and uploaded them and emailed it to you, are his changes anything to help out the progress? And if possible could you provide a link to compiled source he made?

  2. I want it smooth so I wouldn't like it for right now. I just want Stmnnrmn to speed it up in the best ways he knows how, so that it runs great.

  3. Psmonkey good to see your working on your emulator! Do have a release maybe in the future? If so when maybe? How far along is the emulation comming? Like how fast is it running games?

  4. Hi,

    1/ Definitely that render odd or even line is actually more difficult considering the fact that you probably translate your graphic call from the N64 in native graphic PSPGU calls.

    2/Your benchmarking definitely shows that the dynarec is going to be the key of performance improvement in your emu. and is a MUST as I said before.

    Now it start to be very clear that you need it to be REALLY smart from now to get good speed improvement.
    To get all the potential out of the dynarec you will need to analyze completly the generate chunk. and optimize it globally also. (useless moves, better use of the delay slot, avoid useless pipeline flush may be ?).

    When I see that I am not sure that you should generate directly ASM instruction into the chunk while decoding the N64 code but really rely on a "intermediate" encoding allow to build tree of operations, detect register usage and so on...
    And then generate your PSP code at the end, once a chunk has been parsed and optimized.

    Anyway, just throwing out ideas, as usual, coz I dont have so much time to look at the source in details.


  5. Well regardless of any other improvement I really hope you can fix the cause of freezing in 2.0 tiff... Nowadays seems that tiff is always a problem...

    R4 froze like less than 10 minutes after starting Mario 64 and R5 freezes before I even enter the castle...

    I'd love to pass Mario64 all over again in the PSP... And Mario Kart 64...

    I can tell this is a really though job and that you put a lot of hard work into it.

    If you ever try to tackle the gameplay time crashes in 2.0 tiff and need anyone to report on it mail it to me at (sorry I bet you are tired of ppl begging to beta but it's just that so many new apps work in GTA and not in tiff... like all GBA emus apart from PSPGBA 1.0 by psp298)

  6. Hey why don't you downgrade to 1.5 and use devhook? I was a 2.0 user because I loved GTA but then when devhook came out I said I'll test it out. I love 1.5 because I know I get the fastest and best homebrew and I can play any UMD up to 2.5! I'm staying where I'm at forever now and I can't wait till Devhook is up to 2.7 and it looks like that could be very soon!

    Ps. Strmnnrmn please gives us a link to the changes donkey made and sent to you on the R5. The changes he said in the last update in the comments! You have a link to a compiled source code?!

  7. One thing I've been wondering. What's the native N64 frame speed supposed to be for Zelda 64? Is it 16.7ms (60FPS) or 33.3ms (30FPS)? Or something else altogether...

  8. Strmnnrmn please answer my question!

  9. Yes, Morgan. Norman is very busy right now. For those of you who want the updated changes to the source, here are the links to the 1.0 and 1.5 binaries. I already sent the source to norman so anyone who wants to take a look at the source feel free to either email norman or myself for the updated source code and one of us will send you the source.
    Basically this new R5 edition has the new eboot icons and sound from before and also the new menu backgrounds from before as well. The newer code that was implemented is the addition of menu music while you are browsing through the options and rom list. As soon as you choose a rom to load, the music stops and clears the memory for the use of loading the rom into the memory. You can also create your own custom mp3 music for the menu and putting it inside your Daedalus folder naming your mp3 "menu.mp3" It doesn't really matter how long your music is because once it finishes, the music will repeat itself until you choose a rom to play then the music will be freed from the memory usage so that the emualtor will still have all the necessary memory that it needs. Also, once again, this new R5 update has the rom folder the same as the folder that Monkey64 uses. You need to place all your rom files in the root of the memory stick called "n64" just like the way Monkey64 does. So there is no more "Rom" folder in Daedalus. I am working on a few more additions to Daedalus for when R6 comes out. Once again to make things clear, All 100% credit goes to StrmnNrmn for this awesome work of his. Anyone else who has some new ideas for StrmnNrmn's future releases, just drop him an email with the updated source and i am sure he will incorporate it into the new release. Here are the binary files cut in half so that you can see the whole link. Just copy and paste the top part and the bottem part of the link into your browser.

    Link for R5 Changes for 1.00

    Link for R5 Changes for 1.50

    Here are the full links if you can view them

  10. Thanks psdonkey!
    I love the new changes!

  11. PSDonkey thanks for providing the links I'm going to test them out now! But yeah thanks for dedicating your time to this emulator. But really if you have a working N64 emulator why not release it? Or couldn't you at least give Norman some serious help on getting his very fast? Either way thanks and sorry for getting a little impatient, StrmnNrmn keep up the great work!

  16. long time no see.

    must be about 5 or 6 years now... which is about the length of time it's been since i did any programming... so bear with me if this makes no sense at all... or has all already been implemented.

    is there a reason why dynamic recompilation has to be done dynamically?

    i understand that you don't know what execution path the code is going to take, so you need to do most stuff dynamically. but there's bound to be blocks of code that can be optimised outside of the emulation itself.

    you could have a kind static recompilation. a pre-preocessor that would either create a custom rom for use with the emulator... or maybe generate an intermediate file.. like object code to be used as a kind of patch for the rom. could also perform any texture conversions and any other conversions on other things. that might need... ummm... converting...

    anyway... other thing is... doesn't the psp have about 18 processors or something? specifically... doesn't it have 2 x R4000s? could these be used to gether to either run 2 threads simultaneously or perhaps to speed up stuff like 64 bit operations that aren't supported.

    also, doesn't the psp have 2 graphics cores, which from what i can tell, seem analagous to the n64's rdp and rsp. i have no idea how they work... but it'd be good if you could offload some of that rsp/rdp stuff to them... but then i spose it'd be good if we had flying cars too.

    it's no wonder i got kicked out of 2 universities....

  17. Looks as if they have Goldeneye, Mario 64, and Zelda running full speed! Check out pspupdates, Strmnnrmn what is your opinion on this? If this thing is true are you going to continue building?

  18. StrmnNrmn I would also like to see a ME (media engine) version of the emulator. Most psps are 1.5 now so it would make sense to use the Media Engine. I'm sure it would help performance and hey it may open soem doors for you.

    Ps. Please answer my question above also when you can, and keep up the great work I know your busy working on it now!

  19. StrmnNrmn you said by early July you could have something for us, such as R6? Well it's now July 5th so maybe could you post here about progress or whatever you've been working on since you returned from vacation. Also I mentioned using the Media Engine now that most PSP's are 1.5 is this a possibility? Thanks for your hard work thus far and please never give up!

  20. That is very true I don't ever really remember using the dpad really EVER! But yeah StrmnNrmn are we close to a release because you did say at around this time you "would have something for us". I'm not trying to hurry you along I'm patient and I want the best emulator you can build I just want to know if we are close and if you maybe had a date?

  21. If you are cheering on StrmnNrmn to build this emulator so you can play pokemon stadium. Are you a complete clown or what, take that stupid game and throw it away. Give me some goldeneye, starfox, diddy kong racing, mario 64! Pokemon you should be ashamed of yourself. Kepp up the great work StrmnNrmn!

  22. Plus how are you going to play pokemon stadium if you had to have your gameboy cartridge hooked up, come on man think!

  26. Sound would greatly slow down the emulator so I don't expect sound for a while.

  27. Are you talking about the build with the music in the backround (r5 remix) that was just a pic from the emu and the sound from mario in atrac3. Just a eboot file customized, I do that with all my eboots.

  28. Wow, no updates in like what, 2 weeks already... Where is this guy, lol... Any1 heard from him? Thought there was going to be a release of R6 rly soon.

  29. StrmnNrmn probably is hard at work, I hope he's working and that's why we haven't heard from him. But yeah StrmnNrmn you could at least give us an update or at least a comment saying something new.

  30. Argh - tons and tons of comments here! Apologies for the late replies :( I'm going to tackle a few of these before I head off to bed :)

    frmariam: I suspect the memory leak I just fixed may have been responsible for the freezing you talk about. The main difference between R4 and R5 is the dynarec (which uses a lot more memory). I think the reason you're seeing the freezing happening earlier is because R5 is running out of memory more quickly. Hopefully the memory leak fix will sort this out for you.

    flyingbuzz: There are actually three areas I'm looking at doing this for: matrix/vector manipulation, texture format conversion, and triangle setup. I think these three things account for most of the time spent processing display lists, so I'll be looking at doing this as I improve the dynarec performance.

    sroon: I think a frameskip option is always going to involve a tradeoff between lagginess and overall speed. At least if it's an option people can choose whichever they prefer.

    cms108: long time no see indeed mate :)) Seems like eons ago that we were disassembling roms for the first time in Wellington street!
    You could probably preprocess some stuff in advance. I think with the dynarec the idea is just to get the cost of recompiling so low that it's insignificant compared the the time spent executing it. It might make more sense to pre-convert textures, but it would take a lot longer to read them from the memory stick, and I'm not sure there's enough free memory to hang on to them there :(
    It's definitely worth trying to get both PSP cores fully utilised. One idea I was playing with was to emulate the CPU on the first core, and RSP (including display/audiolist HLE) on the second core. The n64 code should handle all the synchronisation issues itself, which is nice.
    Anyway, I'll have to drop you a line soon. Did you know Steve had a sprog?!

    morgan: Any links about a fullspeed emu for those roms seems to have gone now :( In any case, competition is always good :)
    Yup, I agree taking advantage of the ME needs to be investigated.

    flyinghippo/ukcuf16/morgan: A few people have suggested this usage of the O button. I've got this working in my R6 build, but I'd like to make it configurable (i.e. at least whether the default state is for Dpad or Cbuttons)

    exoskeletor: Unfortunately I think sound is going to be some time away :( Like morgan mentions (several times :) it will slow things down somewhat, and while the framerate is like it is now, it'll sound all choppy and horrible.

    disturbd1: Me too :)
