Monday, June 26, 2006

Deciding what to optimise

Whenever I start to answer questions on the comment pages I always end up going into too much detail for a quick response and end up deciding to put up a new post instead. I hope this isn't too annoying :)

In response to Plans for R6 xiringu and ukcuf16 had a couple of interesting suggestions for performance improvements.

First up, from xiringu:

instead of working with a 300x200 screen, work with only half height 150x200 and then display an empty line every other line to get the final 300x200.


That's an interesting idea - it's a trick that's been used by demo coders for years to get a few extra fps. I'm not sure this is going to provide all that much of a speedup to Daedalus though :( The reason for this is that currently rendering only contributes a small amount to the overall cost of each frame, so even if rendering time was totally eliminated, the framerate wouldn't change much. As an example, let's take something like Zelda which currently runs at around 4 fps. At 4fps it means each frame takes 1000/4 = 250 milliseconds to render each frame, which is broken down something like this:

CPU emulation: 200 ms
Display list parsing: 40 ms
Rendering: 10 ms
Total: 200 + 40 + 10 = 250 ms (i.e. 1000/250 = 4fps)

Assuming that we could totally eliminate the rendering time, this would now look like:

CPU emulation: 200 ms
Display list parsing: 40 ms
Rendering: 0 ms (no cost)
Total: 200 + 40 = 240 ms (i.e. 1000/240 = 4.17fps)

So the very best we could hope for in this case would be a .17fps improvement in the framerate :(

ukcuf16 wrote:

Just wanted to ask if there is ever going to be frame skip in later versions :)


What ukcuf16 is suggesting is that the emulator renders one frame, then skips the next. Alternating frames like this should halve the cost of rendering, at the cost of making the framerate a little less smooth.

Again, this is an interesting idea, but I don't really see this having much impact on the framerate as things stand at the moment. Working out the potential speedup is a little more complicated, as we have to take the average time over two frames. The numbers look something like this:

Frame1 CPU emulation: 200 ms
Frame1 Display list parsing: 40 ms
Frame1 Rendering: 10 ms
Frame2 CPU emulation: 200 ms
Frame2 Display list parsing: 0 ms (skipped)
Frame2 Rendering: 0 ms (skipped)
Total: 200 + 40 + 10 + 200 = 450 ms
Average: 450 / 2 = 225 ms (i.e. 1000/225 = 4.44fps)

So even implementing a frame skip mechanism would only give a tiny 0.5fps speedup.

To take this example to its ultimate conclusion, let's assume that I could somehow eliminate the entire cost of display list parsing and rendering:

CPU emulation: 200 ms
Display list parsing: 0 ms (no cost)
Rendering: 0 ms (no cost)
Total: 200 ms (i.e. 1000/200 = 5fps)

Even if I could somehow (magically) reduce the cost of rendering to 0 milliseconds, we'd still only see a 1fps speedup. However, if I can halve the cost of CPU emulation (which is much more likely given the speedups already seen with the new dynarec engine) this is what the calculations look like:

CPU emulation: 100 ms (now twice as fast)
Display list parsing: 40 ms
Rendering: 10 ms
Total: 100 + 40 + 10 = 150 ms (i.e. 1000/150 = 6.66fps)

At the moment I feel that there are more gains to come from optimising the CPU emulation, which is why I've been concentrating on this area recently. As the cost of CPU emulation falls relative to rendering then the ideas suggested by xiringu and ukcug16 will start to become more attractive.

-StrmnNrmn

Thursday, June 22, 2006

Source code rant - update

I was going to post these responses on the comments page, but I was worried that they'd get buried and I have a few important points to make.

From laxer3a:
Now you start to see why the TYL emu source was released shifted by one version from the bin. :-)


I was thinking about doing this, but it's important that people are able to tinker with the source as they please. I usually only ever have the time to refresh the CVS depot when I release anyway, but if other people get involved in the project then this will need to happen more regularly.

gregnoid wrote:
Naa, this pre-release is just a fake !
Becaus PspMonkey had not compiled this.


I need to make it clear I was ranting about PSdonkey, not PSmonkey. I have a lot of respect for PSmonkey and I wouldn't want people to think I was criticising him. It's unfortunate that so many people confuse the two names. Remember: donkey = large four-legged member of the horse family (likes hay, sombreros etc). Monkey = amusing, cheeky primate (likes bananas, mischief, etc) :)

psdonkey said:
About the pre R5 build that I made. Yes I did change a couple of minor things in the source code and things seemed to run a tad bit better. However, after reviewing your updated R5 release, none of the changes that I made in the other build had any effect in this new R5 build that you released. In fact most of what I did was clean up some of the excess code and I can see that you already did this in your R5 release.


I appreciate you clearing up the situation and making your source available - thanks for doing this. I really wasn't expecting to see this happen, so accept my apologies for the criticism I levied in my previous post. I had a bit of a Hulk rage going on and it wasn't really justified.

You make a really good point about having a shared directory for roms between Daedalus and Monkey64 - I'll look at rolling this change into the next official release (R6). If you're really keen on helping contribute to Daedalus then I think you should drop me an email and we can talk about the possibility adding you as a contributor on sourceforge.

-StrmNrmn

Plans for R6

I'm back from Spain now. I had a great time in Barcelona, it's a bit of a shame to be back :)

I'm planning to have quite a quick turnaround on for the next release, i.e. hopefully I'll have something ready by the end of next week, or early July. I want to concentrate on fixing a bunch of graphical issues (adding new combine modes to fix various pink+black textures etc). I'm also going to look at reducing memory requirements - I think the stability problems associated with running the emulator with dynarec for an extended period of time are a result of running out of memory. This should also help to improve the Expansion Pak support. If I get enough time I'll look at adding support for configuring the controls on a rom by rom basis. Finally, I also want to look at improving savegame support.

-StrmnNrmn

PS Congrats Ghana + Australia :D

Friday, June 16, 2006

Away for a few days

I'm going to Spain for a few days to celebrate a friend's birthday. I'll be back on Monday sometime, so I'll try and respond to various questions/issues about the R5 release then.

Buenos nachos! :)

Thursday, June 15, 2006

PSdonkey's build

PSdonkey made this comment when releasing a pre-release of R5 last week:

I went ahead and compiled the new source for everyone and also added a couple of minor changes to the source for speed improvements.


As I have no way of getting in touch with him directly, I'd like to ask him publicly if he could forward me a set of diffs for the 'speed improvements' he applied. Not just because this is a requirement of the GPL (the license under which the Daedalus PSP source is made available), but because I think it would be beneficial to the project to apply these changes to the main source tree. Having said that, I suspect PSdonkey might simply have been attempting to take partial credit for several weeks development work. I'll let you know if I receive any diffs...

Daedalus PSP R5

I've just uploaded the R5 release to sourceforge. Here's the changelist:

[+] New DynaRec engine, resulting in significant performance improvements
[+] New front end - ability to toggle a couple of options (more to come)
[+] Save game first pass (eeprom4k, eeprom16k and mempak)
[^] Various interpreting engine optimisations
[~] Use .png fileformat for background images, save ~380KB
[~] Stripped out unnecessary code, save ~250KB


By far the most substantial change was the addition of the new dynarec engine. As detailed in previous posts, this is still some way from completion but is already providing significant benefits (around a 2x increase in speed in many roms).

I've also added the groundwork for a new front end and implemented a first-pass of the savegame system. The savegame support isn't fully tested so I'm expecting a few teething problems - please post any bug reports on the sourceforge site (preferably!), the comments page or email me (check the readme.txt..)

I'm not too sure what I want to concentrate on next. It's been over a month since the last release and I think that it would be a good idea to knock out the next few releases in quick succession, to help me pick up some momentum that I've lost. If I aim to do this I'll probably concentrate on some nice (but quick) improvements such as various graphical fixes, savegame support and per-rom control configuration. Any suggestions welcome :)

Linkage:

R5 Source
R5 for v1.00
R5 for v1.50

Tuesday, June 13, 2006

Brief Update

This is just a brief update as it's been a while since my last post. I'm hoping to release a new build tomorrow, possibly Thursday at latest. I was planning on making an official release last weekend, but I got sidetracked polishing a few things. I've got a slight bit more to finalise, but there should be a couple of nice little additions since the source drop last week.

Monday, June 05, 2006

Source updated

Earlier this evening I updated the project CVS repository with the latest version of the code. Normally I only do this when I release a new build, but I know people have been playing with the code and have expressed an interest in seeing the latest developments.

I'm not quite ready to release a new binary yet (still a few more optimisations I want to make and various bugs to fix first), but I'll try and do this within the coming week.

Incidentally, updating the source normally only takes 20 minutes or so, but it took a good couple of hours tonight. Sourceforge updated their CVS service recently (May 12th) and as a result I had to spend a couple of hours updating WinCVS, generating new SSH keys and the like. Hopefully it won't be so painful next time around, or I might just lose the will to live.

As a more general update, I cleared a couple of things from my TODO list sorted this weekend. I'm caching floating point registers for most of the single-precision Cop1 instructions, which are now implemented directly in the dynarec code. I've not timed this in depth yet, but it's shaving 10-20ms/frame off the intro to Mario 64 (Mario's Head), which is particularly FPU heavy (i.e. I'm getting ~160ms/frame rather than ~180ms)

Finally I need to have a good think about how to go about optimising the double-precision floating point performance. As the PSP doesn't have hardware support for double precision floating point this is currently very expensive (i.e. adding 2 doubles on the n64 takes just one instruction - on the psp this balloons to several hundred as it all has to be done in software).

Currently I cheat and cast all the double-precision floats to single-precision values before performing the calculations. Although this is much faster it obviously loses a lot of precision, so I need to be careful it's not going to break any roms. Also even the float->double/double->float conversions are pretty expensive so it's still not an ideal solution. Fortunately not many roms seem to use double-precision maths extensively (presumably because it was relatively expensive on the n64), and where they do use it they don't seem to be too sensitive to the fact that I'm throwing most of their mantissa away :)