I've added code to handle the following ops:
- MULT, MULTU (multiply, multiply unsigned)
- DIV, DIVU (divide, divide unsigned)
- MFLO, MFHI (move from lo/hi)
- MTLO, MTHI (move to lo/hi)
- LB, LBU (load byte, load byte unsigned)
- LH, LHU (load halfword, load halfword unsigned)
So far I'm seeing around a 5-6% speedup with these changes (on top of the 10-12% speedup I talked about on Sunday). I am generating slightly more code as a result of this work, but given the large savings I made over the weekend this isn't much of an issue.
My next job is to look at optimising the remaining load/store instructions - I just have LWU/SB/SH to do (ignoring the 64 bit instructions for now). Once that's done I'm going to have a look at optimising sequences of load/store operations by caching the base address between uses. I think that should give a significant speed up for memory intensive chunks of code.