Well, some of that work happened in Y22.
LFS basically draws a lot of geometry vertex for vertex and (I assume) takes little care to group objects with similar state (shaders if any, textures, blending etc.) together before drawing. On older CPUs and GPUs that made sense since the GPUs got fed anyway and the CPU overhead of batching like that was too high, but these days the GPUs are orders of magnitude faster while CPUs mostly aren't. The CPUs can't keep the GPUs fed this way any more. The result is a maxed out CPU and a GPU mostly idling.
Changing this is no small feat though since it requires rather drastic changes to the design of the renderer. Switching to using vertex buffers instead of drawing vertex for vertex is simple(ish), and helps free up some CPU load (this happened at least partly in Y22). But the big gains are found in state grouping. Instead of for example drawing cars and their component pieces one after the other, LFS should instead draw (for example) all car bodies, then all wind screens, then all wheels, then all tires etc. This minimises the changes in state the GPU has to do, enabling it to scream through geometry at full tilt.
This can all be done within the D3D8 API though, and could potentially free up a lot of both CPU and GPU horsepower which could then be used on fancy shaders and other goodies.