The online racing simulator
Analysis of WR and League race times to determine performance factors
Hello everyone,

I've been looking at performance balance in various ways for several years. Professionally I've recently come across JASP and I've now put the LFS world records as of 10th October 2024 and several league races into a multiple linear regression. My goal is to determine the performance factors and perhaps derive a performance index as a way to apply BOP and evaluate the theoretical performance of car mods. The attached preliminary results are a work in progress.

The starting point for the analysis is the LFS World Records. I copied the tables from lfsworld.net on 10 October 2024 and converted the lap times from the typical format to seconds. Knowing the length of the track, the lap times were converted to speeds in metres per second. I also extended the dataset to include selected league races from that year.
Then I collected information about the cars. At first I only took the information available on lfs.net, but it turned out that the car datasets on the content page did not fully match the datasets of the mods on the files page, so I took the data from in-game. This also allowed me to get downforce values for the cars, although I only used the base setup values at 40 m/s in the linear regression.

At first the results were rather unimpressive with large residuals. By including the circuits as factors, the results improved greatly and I am hopeful that I am on the right track. Other covariates such as torque did not produce significant results, so they were not considered at this stage. I also decided to use the natural logarithms of the speeds as the dependent variable to obtain the coefficients "as factors".

The regression model currently has the following dependent variable:
  • log(Speed)
the following covariates:
  • Mass
  • Power
  • Weight distribution (front)
  • Engine size
  • Downforce Lift
  • Downforce Drag
and the following factors:
  • Track (not shown below)
  • Category
  • Drivetrain
  • Tyres
  • Engine layout
  • Transmission

| Model | | Unstandardized | Standard Error | Standardized | t | p |
|--------|------------------|--------------------|----------------------|--------------|------------|------------|
| M₀ | (Intercept) | 3.722 | 0.007 | | 554.386 | < .001 |
| M₁ | (Intercept) | 3.127 | 0.039 | | 79.631 | < .001 |
| | Mass | -1.901 × 10⁻⁴ | 6.990 × 10⁻⁶ | -0.247 | -27.200 | < .001 |
| | Power | 0.001 | 2.267 × 10⁻⁵ | 0.591 | 52.229 | < .001 |
| | Weight dist F | 0.004 | 3.965 × 10⁻⁴ | 0.124 | 10.263 | < .001 |
| | Size | 2.189 × 10⁻⁵ | 4.023 × 10⁻⁶ | 0.067 | 5.441 | < .001 |
| | Downforce Lift | 1.092 × 10⁻⁴ | 4.673 × 10⁻⁶ | 0.534 | 23.368 | < .001 |
| | Downforce Drag | 7.009 × 10⁻⁴ | 3.904 × 10⁻⁵ | 0.376 | 17.953 | < .001 |
| | Category (Saloon car) | -0.076 | 0.019 | | -4.029 | < .001 |
| | Category (GT) | -0.272 | 0.010 | | -26.648 | < .001 |
| | Category (Prototype) | -0.115 | 0.019 | | -6.174 | < .001 |
| | Category (Bike) | -6.838 × 10⁻⁴ | 0.019 | | -0.037 | 0.971 |
| | Category (Buggy) | 1.403 × 10⁻⁴ | 0.027 | | 0.005 | 0.996 |
| | Drivetrain (FWD) | -0.045 | 0.004 | | -10.596 | < .001 |
| | Drivetrain (AWD) | -0.015 | 0.004 | | -3.725 | < .001 |
| | Tyres (Road) | -0.317 | 0.019 | | -16.702 | < .001 |
| | Layout (inline) | 0.187 | 0.009 | | 21.211 | < .001 |
| | Layout (flat) | 0.195 | 0.008 | | 23.547 | < .001 |
| | Transmission (sequential gearbox) | -0.014 | 0.009 | | -1.572 | 0.116 |
| | Transmission (sequential gearbox with ignition cut) | 0.019 | 0.006 | | 3.343 | < .001 |
| | Transmission (H-pattern gearbox) | 0.062 | 0.007 | | 9.025 | < .001 |
| | Transmission (motorbike gearbox) | -0.119 | 0.011 | | -10.513 | < .001 |
| | Transmission (centrifugal clutch) | 0.162 | 0.033 | | 4.983 | < .001 |

The input data and results pdf are included in the attached archive. The resulting coefficients must be transformed using the exponential function. The results are based on a formula car driven on BL1 with a paddle shift gearbox. The engine layout is a V-engine and slick tyres are used. All coefficients shown are deviations from this standard car.

Looking at the standardised coefficients, we can already see that power has the greatest influence on the car's performance. However, it is closely followed by the downforce (lift) of the car.

I am at a very early stage in this analysis. I hope to get some meaningful results from it. If not, I will at least gain experience with JASP. I'm looking forward to your ideas and insights. Maybe you will find weaknesses I can work on. I'd like to improve this approach in the future.

Best regards!
Attached files
2024-11-19_analysis_2024-10-18_WR.zip - 1.7 MB - 98 views
Are you trying to plot which property makes a car fast?
I am not sure what I am looking at here. Many numbers for sure. Tilt
Some random thoughts:

Races often already have some BOP applied, that makes it hard to compare.
Even if no BOP is used, the track, distance, mandatory pit stops etc might be chosen in a way to make it balanced. Typical example for TBO class might be that FXO is fastest but drivers eventually need to slow down to save tires.

Another idea:
Maybe try to figure out something about the difficulty/technicality of car/track combos by plotting a histogram of all uploaded times.
For example for XFG@Kyoto Oval I would expect that all the top times are very close to each other. On the other hand something like Fo8 at South City might have bigger gaps.
There might be a large plateau at XFG@Bl1 of newbies using default setups or other such things.

The ratio between "average speed" and "theoretical car top speed" (aka how much of a lap is at full throttle or high speed) could be interesting, too.
We know that Kyoto Oval is the fastest track, but which track is second fastest and which is the slowest?
(Also for different cars)


Mildly related:
Do you know of a way to get all WR times and their upload date?
It might be interesting to see how WR times have improved over time, how big the improvements were, which WR was standing longest until it got beaten etc.
I don't know if this will be any use for you but I got 5 years worth of best lap times data from Fragmaster's Airio. As you know both XFG and XRG and FOX race without handicaps. That data is just fastest laps recorded on servers so not specifically races. It can be practice, draft (even bumpdraft). Also just noting there's PB splits and TB splits separately.
And for reference there's data in human friendly format for WE1 XFG of my person:
Nov 21 22:07:43 Stats for: RG^7M@CI3K (jackson93)
Nov 21 22:07:43 Track/Car: WE1+XFG Laps: 16 (67/139)
Nov 21 22:07:43 PB/Date: 2:03.53 26.01.2023 20:17
Nov 21 22:07:43 46.27 26.25 51.01 (1:12.52) 21/139
Nov 21 22:07:43 TB/Raced: 2:03.20 26.01.2023 20:26
Nov 21 22:07:43 46.14 26.05 51.01 (1:12.19) 15/139

Attached files
stats.zip - 1.8 MB - 33 views
Hi, very cool idea, tnx for investing time in this.

It's gonna be painful to extract the info you're looking for as there are away too many variables. Generalizing this kind of problems is better done by using some form of machine learning methods for pattern recognition. You're probably not gonna be able to find a simple analytic function to fit it, but it's worth giving it a shot.

I suggest to add one more data point as a standard car on the lower end of performance spectrum, like an XFG. Then make a linear fit between XFG and F1. Evaluate all cars against this fit, the deviations may give you something that could be proportional to BOP.
Reply
Quote from Gutholz :Are you trying to plot which property makes a car fast?

Yes, I eventually want to do exactly that. Smile As of the early results posted above, you can take the standardized coefficients as a metric for the impact of the properties. Power influences the performance the most, followed closely by the lift, then it is drag, mass, weight distribution and lastly the engine size.

Quote from Gutholz :I am not sure what I am looking at here. Many numbers for sure. Tilt
Some random thoughts:

Races often already have some BOP applied, that makes it hard to compare.
Even if no BOP is used, the track, distance, mandatory pit stops etc might be chosen in a way to make it balanced. Typical example for TBO class might be that FXO is fastest but drivers eventually need to slow down to save tires.

Indeed, the performance over the course of a race differs quite a bit from a single lap performance. Plus lap counts and pitstops impact the performance quite significantly. I tried to model it in the past for the Open Endurance Cup Five and probably need to rehash some of the ideas to make it work here. https://www.lfs.net/forum/thread/94326-Open-Endurance-Cup-Five---Balance-of-Performance-Discussion

Quote from Gutholz :Another idea:
Maybe try to figure out something about the difficulty/technicality of car/track combos by plotting a histogram of all uploaded times.
For example for XFG@Kyoto Oval I would expect that all the top times are very close to each other. On the other hand something like Fo8 at South City might have bigger gaps.
There might be a large plateau at XFG@Bl1 of newbies using default setups or other such things.

I think getting all the uploaded times may be the real challenge, since I basically naively copy’n’pasted the times from the LFSWorld tables into Excel. For XFG@BL1 there are 347 hot laps uploaded across 14 tables. While other combos won’t have as many uploaded times as the demo combos, it is still a lot of effort needed to do such an analysis.

Quote from Gutholz :The ratio between "average speed" and "theoretical car top speed" (aka how much of a lap is at full throttle or high speed) could be interesting, too.
We know that Kyoto Oval is the fastest track, but which track is second fastest and which is the slowest?
(Also for different cars)

You can already indirectly get a ranking of the different track speeds by looking into the descriptive statistics as these are practically the mean speeds for each track. It does not factor in the theoretical car top speed. I didn’t look into this, but is the value in the car selection screen independent from the used setup? If so, it would be a thought to include it, although it should prove to be highly correlated with other covariates as well.

Quote from Gutholz :Mildly related:
Do you know of a way to get all WR times and their upload date?
It might be interesting to see how WR times have improved over time, how big the improvements were, which WR was standing longest until it got beaten etc.

I probably would need the whole LFSWorld hot lap database for this. Big grin It definitely would make for interesting facts, but to be honest I probably won’t be able to answer these with what I want to pursue.

Quote from rane_nbg :Hi, very cool idea, tnx for investing time in this.

It's gonna be painful to extract the info you're looking for as there are away too many variables. Generalizing this kind of problems is better done by using some form of machine learning methods for pattern recognition. You're probably not gonna be able to find a simple analytic function to fit it, but it's worth giving it a shot.

You’re right, machine learning is a great tool to do such analysis. However it also is a black box and not easy to understand, why particular results emerge. So I decided to go this route. I furthermore wanted to get practical with JASP.

Quote from rane_nbg :I suggest to add one more data point as a standard car on the lower end of performance spectrum, like an XFG. Then make a linear fit between XFG and F1. Evaluate all cars against this fit, the deviations may give you something that could be proportional to BOP.

Basically it is a core thought of this method to make a linear fit but to multiple different properties. The resulting fit will be the optimal estimation for the given data set. I further used the logarithm to be able to get the speed by a product of multiple factors using the following properties:


log(s) = β₀ + β₁ × m + β₂ × p + β₃ × w + β₄ × z + β₅ × l + β₆ × d + β₇ + ...
e^(log(s)) = e^(β₀ + β₁ × m + β₂ × p + β₃ × w + β₄ × z + β₅ × l + β₆ × d + β₇ + ...)
s = e^(β₀) × e^(β₁ × m) × e^(β₂ × p) × e^(β₃ × w) × e^(β₄ × z) × e^(β₅ × l) × e^(β₆ × d) × e^(β₇) × ...

(s = speed, m = mass, p = power, w = weight distribution front, z = engine size, l = lift, d = drag with β being the corresponding coefficients from the multiple linear regression, β₀ = intercept and β₇ onwards being the coefficients of the categoric factors)

I got the latter idea from hedonic regressions in the real estate evaluation. I recently took part in a workshop on that topic. There is a similar situation, where statistical analysis is applied to very heterogeneous data sets to get general models and to reapply them to specific objects. For the application in LFS, there are at least metric values and precise categories, which in theory should make it easier then for a real estate index.
Quote from jackson93 :I don't know if this will be any use for you but I got 5 years worth of best lap...

I saw there are a few laps included with mods. Is there a good way to translate the "cryptic" names given by AIRIO to the HEX codes of the mods system? I naively tried =CODE() in Excel for each character individually for that purpose, without meaningful results.
I wish I could help you with this. Only piece of info I have (per @limac92):
Case study:

For "XFG" insim sends [88, 70, 71, 0],

88 is ASCII "X",
70 is ASCII "F",
71 is ASCII "G".

For E-Chellenger (FA2989) insim sends bytes [137, 41, 250, 0],
but we cannot treat this values like ASCII characters because we'll get "‰)ú\0",
but,

137 is in HEX 89,
41 is in HEX 29,
250 is in HEX FA.

So I guess you would need to change each letter to its ASCII counterpart then to hex. Second part is easy-ish. No idea about first.

But I wouldn't count on mod stats from FM's airio. We have to clean every non-ascii entry in database every so often because they cause airio failing to launch after it's restarted.
Thanks for clarification. So the bytes sent are in reverse order compared to the code of the mod.

Even though the mod data might not be reliable I give it a shot and test them first before discarding them.

FGED GREDG RDFGDR GSFDG