With Wednesday's changes incorporated, I reprofiled a few roms to see where most of the CPU time was going. Things have changed considerably since I initially talked about deciding what to optimise. Looking at the profiler for Mario 64 the time spent executing display lists is now a much more significant fraction of the total time spent on each frame. Back around R3/R4 only around 20% of the time was spent here. With the latest build display list processing now accounts for around 35-40% of the time. The display list processing hasn't become any slower, it's just becoming more significant as I've optimised the CPU emulation.
One of the settings I mentioned was worth disabling for a speed boost when I released R7 was the 'Tesselate Large Triangles' option. When this setting is enabled, it causes the display list processor to recursively break up large triangles into smaller pieces. This has been necessary to overcome the PSPs poor hardware clipping support; without breaking the triangles up into smaller pieces, the PSP will often fail to render large triangles as shown below:
Super Mario 64 without clipping
The large triangles that make up the floor that Mario is standing on are rejected by the PSP, leaving a large hole where the floor should be. By breaking the triangles into smaller pieces before attempting to render them, it reduces the chance that the PSP will decide to discard them.
There were a few problems with the 'Tesselate Large Triangles' setting which I've been working on overcoming this weekend. Firstly, it's not perfect - there were plenty of cases where visible triangles would still be culled even when they had been subdivided 3-4 times (which generates 27-81 triangles for each input triangle!). This was always quite noticable in games with a relatively low camera, such as racing games. The other big problem with this setting was that it was very slow - often adding over 20ms per frame.
This setting was always intended as a quick fix rather than a long term solution, so I've been looking at fixing both of these problems over the past few days. I started by ripping out all the exisiting polygon clipping and tesselation code and starting from scratch. After a couple of days of hacking I've finally got a replacement system that seems to be clipping everything I've thrown at it perfectly. Here's a shot of the same location in Mario 64:
Super Mario 64 with new clipping code
Now that I have a working version of the code in place, I'm going to look at optimising it. At the moment the new clipping code is roughly as expensive as the tesselation code, but due to the way it's implemented I think it should be much easier to make work with the PSP's VFPU, as I can process batches of vertices in parallel. Ideally I'd like to get this change into the next release, so I'm going to hold off putting the R8 build together until it's ready. I'll let you know how I get on.
-StrmnNrmn
9 comments:
So does this mean it would also help out Mario Kart with this subject?
Fascinating post as always. I look forward to every new release from you.
So if I understand correctly now we get the same speed boost from this setting except we get better graphics (actually still alot like the normal settings). With this new code do we get a bigger speed boost than the old code or is it just the same? Thanks StrmnNrmn for the update I really appreciate it and just release R8 when your ready man, but things are really looking great!
morgan: that's wrong. The 'normal' is Tesselate Tris OFF. Tesselate Tris is an extra step done to prevent some polygons from disapeearing. He just replaced that with a more effective method.
Oh okay I get it now nevermind.
Nah I can wait! (i guess) its just I got inside the Deku tree and I wanted to go further!
lol!
oH YA AND THE EXPASION PACK!
PAYPAL!
Calm down guys I'm sure StrmnNrmn will do all he can to help us out but he does have optimizations he wants to incorporate into the next release. Just give him space and time and he'll have this emulator running great! Keep up the amazing work StrmnNrmn and R8 should be very interesting if you get the ingame menu going.
Sorry I've been neglecting the comments pages over the past couple of posts. When I see 100+ comments I end up feeling quite overwhelmed and not quite sure where to start!
mario kart god: Yes - it helps every rom.
morgan: The new code gives better graphics (much, much better in some cases) and it's also significantly faster than the 'tesselate large triangles' setting (in fact, it's almost as fast as having that setting disabled entirely).
stee: Yes - Mario 64 looks almost perfect now (with the exception of a few places that use mirrored textures.) This should fix the graphics in all roms where large gaps were appearing because polygons weren't being drawn.
wally*won_kenobie: Hiya. Sorry, I haven't checked my email since before the weekend. I have Outlook set up to get all my messages from gmail, and I've not installed that on the new pc yet. I'll get back to you ASAP.
retrogamer: The new clipping code is a lot more efficient at rejecting geometry that's out of the n4's view frustum (i.e. the visible area on screen.) To be honest I'm not sure that the slowdown will be entirely due to this - it's more likely that the CPU emulation is doing a lot more work at this point for some reason. We're not quite ready for sound yet, maybe another couple of releases I think.
psdroideka: 1) As I mentioned above I've not been able to check my email for the past few days, so I don't know yet. 2) It all starts getting confusing when you start talking about frameskip. As I see it the n64's apparent framerate will increase (because every other frame takes 0ms to render), but the psp's framerate will effectively be reduced (as it's doing twice as much CPU work between rendering each frame.) The main question is whehter it feels any better to play or not. 3) Yeah, I'll look into this. I'm planning on taking a good look at the whole savegame system very soon (i.e. R9 or R10)
psdroideka: I actually had a look into why Mario Kart is so choppy. I don't think it's anything to do with the clipping, I think that just makes the problem look worse than it is. Normally, a game will do the following:
Render DList
Flip screens
Render DList
Flip screens
Render DList
Flip screens
It seems that for some reason MarioKart gets a bit out of synch:
Render DList
Render DList
Flip screens
Flip screens
Render DList
Flip screens
Render Dlist
Flip screens
etc
In this example it renders two frames, then flips the display twice. This effectively means that it ends up showing the same frame multiple times, before suddenly catching up, which is causing the choppiness. I've no idea why this has started happening now - I'll have to look into it further.
Post a Comment