In response to Plans for R6 xiringu and ukcuf16 had a couple of interesting suggestions for performance improvements.
First up, from xiringu:
instead of working with a 300x200 screen, work with only half height 150x200 and then display an empty line every other line to get the final 300x200.
That's an interesting idea - it's a trick that's been used by demo coders for years to get a few extra fps. I'm not sure this is going to provide all that much of a speedup to Daedalus though :( The reason for this is that currently rendering only contributes a small amount to the overall cost of each frame, so even if rendering time was totally eliminated, the framerate wouldn't change much. As an example, let's take something like Zelda which currently runs at around 4 fps. At 4fps it means each frame takes 1000/4 = 250 milliseconds to render each frame, which is broken down something like this:
CPU emulation: 200 ms
Display list parsing: 40 ms
Rendering: 10 ms
Total: 200 + 40 + 10 = 250 ms (i.e. 1000/250 = 4fps)
Assuming that we could totally eliminate the rendering time, this would now look like:
CPU emulation: 200 ms
Display list parsing: 40 ms
Rendering: 0 ms (no cost)
Total: 200 + 40 = 240 ms (i.e. 1000/240 = 4.17fps)
So the very best we could hope for in this case would be a .17fps improvement in the framerate :(
ukcuf16 wrote:
Just wanted to ask if there is ever going to be frame skip in later versions :)
What ukcuf16 is suggesting is that the emulator renders one frame, then skips the next. Alternating frames like this should halve the cost of rendering, at the cost of making the framerate a little less smooth.
Again, this is an interesting idea, but I don't really see this having much impact on the framerate as things stand at the moment. Working out the potential speedup is a little more complicated, as we have to take the average time over two frames. The numbers look something like this:
Frame1 CPU emulation: 200 ms
Frame1 Display list parsing: 40 ms
Frame1 Rendering: 10 ms
Frame2 CPU emulation: 200 ms
Frame2 Display list parsing: 0 ms (skipped)
Frame2 Rendering: 0 ms (skipped)
Total: 200 + 40 + 10 + 200 = 450 ms
Average: 450 / 2 = 225 ms (i.e. 1000/225 = 4.44fps)
So even implementing a frame skip mechanism would only give a tiny 0.5fps speedup.
To take this example to its ultimate conclusion, let's assume that I could somehow eliminate the entire cost of display list parsing and rendering:
CPU emulation: 200 ms
Display list parsing: 0 ms (no cost)
Rendering: 0 ms (no cost)
Total: 200 ms (i.e. 1000/200 = 5fps)
Even if I could somehow (magically) reduce the cost of rendering to 0 milliseconds, we'd still only see a 1fps speedup. However, if I can halve the cost of CPU emulation (which is much more likely given the speedups already seen with the new dynarec engine) this is what the calculations look like:
CPU emulation: 100 ms (now twice as fast)
Display list parsing: 40 ms
Rendering: 10 ms
Total: 100 + 40 + 10 = 150 ms (i.e. 1000/150 = 6.66fps)
At the moment I feel that there are more gains to come from optimising the CPU emulation, which is why I've been concentrating on this area recently. As the cost of CPU emulation falls relative to rendering then the ideas suggested by xiringu and ukcug16 will start to become more attractive.
-StrmnNrmn