The online racing simulator
Something worth noting regarding the replay in my screenshot, that was a replay of a race I'd just been in. I don't think I'd restarted LFS in the meantime, so it may have been a download that was still stuck from the race itself.


Edit: Said race happened *before* Victor turned off CloudFlare, even if I couldn't load the replay sometime after. It's still possible that turning off CF has solved the issue.

Wild stab in the dark in case it is a CloudFlare issue - could it have been that the first time someone downloads a particular skin, CF's attempt to load and cache does something weird when someone requests the file the first time? All of the events I've been in lately will likely have had multiple brand-new skins uploaded for them.
Quote from Victor :FWIW on my end, yesterday I also downloaded a bunch of league replays and had no issues downloading any skin. Tested on plain LFS version U.

(ps I upped the max upload size for MPR to 32MB - could be handy for you)

Have not tested it again today, but in general I only had the issues when I was on a server, someone joined the session and his/her skin wouldn't load. Skins of other drivers joining afterwards would not get downloaded either. When I left the server either to join another server (maybe even the same one again) or like in Degats screenshot when opening a replay afterwards it would get stuck.

When starting LFS and then opening a replay directly so far I did not encounter the problem yet.
OK. That negative number on the replay skins download screen is probably not memory corruption. It looks a bit obscure but rather than being a true counter, the number comes from this:

original_number_to_download - number_in_queue + 1

So what this really indicates is too many skins in the queue. From what you are saying I'm prepared to believe this problem may only start when you are in game. One skin gets stuck somehow, a timeout doesn't solve it, then the queue just stays occupied from then on. Interesting thoughts about it possibly being related to new skins. I'll spend some time reading the code and see how the timeout could fail.

Please do report if it happens again today. If the cloudflare routing was a DNS thing then it might have taken a while to propagate. So as far as I know the problem could still be related to cloudflare. I'm not trying to excuse LFS from being at fault, though.
I'm on 0.6U11, I've been unable to trigger it again since my last post - I've just joined almost every populated server in order. It occurs to me that the error I got last night might have been an old skin still stuck (I usually just put up with white cars rather than fix it), and that cloudflare might have been the problem after all. I will keep testing when I can.

PS, Scawen: I didn't mean to suggest that you didn't code a timeout, just that fixing it would be a tweak worthy of going to an official version. I know a lot of people are waiting for that so that Lazy can be updated. The meaning was lost in the endless edit Smile

Re: disabling downloads at your end - on one hand we can do that individually if it's a problem for us and can keep it turned on if we're OK with restarting to work around the issue. On the other new users doesn't necessarily know that they can do that, and/or won't realise that that's what they've got to do. Big downsides for both options Frown
I can see one thing that would make the skin thread get stuck.

That is, if while still getting the headers, FD_READ is reported but recv returns 0.

[EDIT: This may be the case but LFS may well be causing that to happen - see my next post]

This case is checked while downloading the skin data but it seems that 0 is not checked as an error value while getting the headers. Only SOCKET_ERROR is treated as an error.

So the skin thread can end up in a loop where it calls WSAEnumNetworkEvents which says (lNetworkEvents & FD_READ) is set. So apparently there is data to receive and LFS updates the timeout timer. Then the recv function returns 0 bytes. LFS never fills up the buffer enough to read another header, as it's always adding zero bytes to it.

It sounds like something that shouldn't happen. Being told that there is something to read, but actually there are zero bytes waiting? But maybe it is normal and LFS should be reacting to that return value of zero as an error and that would be the end of the bug.

I'm not certain this is happening but that's the only possibility I can see so far.
From what I can tell, it is technically possible and valid to send/receive a packet with a zero byte payload*, though I have no idea why anyone would try to send a packet like that.

It's possible CloudFlare does some header only communication for some reason, which doesn't require a payload. They do weird things with headers sometimes...



*I was researching this while trying to debug some Sim Broadcasts stuff
OK scrap that, I think I might have the answer.

I think it may be that LFS is calling recv, with a zero 'len' value, in other words, "give me some data, up to a maximum of zero bytes".

It looks like this can be the case if the size of the header is more than 768 bytes (in game) or 1024 bytes (at replay start). That is the size of the buffer it uses to read the header.

In that case LFS reads all the lines from the header, and if they aren't all there yet, it goes round again to recv more data. Only trouble is, it doesn't move the lines out to make space for more data, and so that's why it asks recv for a maximum of zero bytes.

I can reproduce this scenario by building a version with a tiny buffer. But is that reasonable for an http header, to be over 768 bytes? If so then I think we have the answer.
edit - replied too late, but anyway my two cents Smile

I've always treated reading 0 bytes as a remote disconnect. In many languages that's actually the only way to tell if there was a remote disconnect. So if a socket select/poll/event says there's activity on the read buffer, but a read returns 0 bytes -> remote disconnected.
Victor, would you expect http headers over 768 bytes? It's supposed to be a header, not an essay. But maybe this is the way these days.
Headers can contain long things yes. Cookies can pile up for example, all in one header. But that's not the case on the skins.lfs.net domain.

Trying a request on lfsmanual.net which is still behind CF:

< HTTP/1.1 200 OK
< Date: Wed, 28 Oct 2020 15:33:34 GMT
< Content-Type: text/html; charset=UTF-8
< Transfer-Encoding: chunked
< Connection: keep-alive
< Set-Cookie: __cfduid=dc9140hced7c7ae3f57372d87a9ada2251703779214; expires=Fri, 27-Nov-20 15:33:34 GMT; path=/; domain=.lfsmanual.net; HttpOnly; SameSite=Lax
< X-Content-Type-Options: nosniff
< Content-language: en
< X-UA-Compatible: IE=Edge
< Vary: Accept-Encoding, Cookie
< Expires: Thu, 01 Jan 1970 00:00:00 GMT
< Cache-Control: private, must-revalidate, max-age=0
< Last-Modified: Wed, 01 Apr 2020 10:22:26 GMT
< Vary: Accept-Encoding
< CF-Cache-Status: DYNAMIC
< cf-request-id: 061170e24000002cef2eb6c000000001
< Expect-CT: max-age=604800, report-uri="https://report-uri.cloudflare.com/cdn-cgi/beacon/expect-ct"
< Report-To: {"endpoints":[{"url":"https:\/\/a.nel.cloudflare.com\/report?s=BSoxSfy4R6nmTsRwhaOuQghY6tfvpNs3DPGY45n39SooOIrjgIBiUqRvFuQ5T%2BBN7lkq%2BH%2FfVTvqmSqhdfMdiFfN97Tu7lorKRemqVejVxAK"}],"group":"cf-nel","max_age":604800}
< NEL: {"report_to":"cf-nel","max_age":604800}
< Server: cloudflare
< CF-RAY: 5e95b749ffbc2cef-LHR

There's a somewhat long header of about 200 bytes - but that's the longest there.

In the http spec there are no limits defined. But servers have something like 8k for max length of a single header. Browsers seem to go for 10k-256k (!) I guess just to be safe?

Anyway, 768 bytes sounds reasonable in this case. But you cannot guarantee there will never be headers longer than that.
It's actually the total length of all the header lines that is the problem, not an individual line. Because LFS has not been removing the read lines from the buffer when it goes round to ask for more data.

I'm thinking cloudflare these days sometimes adds so many lines to the header that the total amount of text is sometimes over 768 bytes. So that exposes the LFS bug that never showed up in the past.
ahh yes then for sure 768 is insufficient.
I have a feeling that skin downloading will go fine now (as in, a normal header will not exceed 768 bytes) but will go wrong again if we need to use cloudflare. So maybe I should fix the bug (make it able to read an essay before the jpg data) and we could do the official release so people will be able to use LFS Lazy again, if Daniel is around to update it. Smile
Official release whit new tyre physics
No. It would be test patch U11 + bug fix.
ah okay boss
Have been playing today and so far I haven't got skins stuck downloading, so I guess its fixed, but I will keep an eye on it.
my 1-cent
Official Version

Monday 26 : problem with skins dld during training on our server (RS)
Wednesday 28 : no problem during Races on same server
Yes, this seems to be temporarily fixed now. Thanks for the feedback.

Cloudflare adds additional headers and I think the bug is that LFS couldn't handle the extra amount.
https://support.cloudflare.com/hc/en-us/articles/200170986

LFS downloading skins in menu could handle up to 1024 bytes total for all headers and in-game it could only handle 768 bytes. After it goes wrong once, it gets stuck in that state and can no longer download skins.

Although we haven't seen an real example, I can reproduce the symptoms by making the header limit even smaller then trying to get a skin from our server.

I'm fixing it in LFS though you shouldn't see the problem for now anyway.
i have identic problem. but now. and last month i testing DNSSEC and resorvel with DNS server on my OpenWRT router. Modem VDSL is setting identic last 3 years...

its problem only on router ?
i so testing dns-proxy but stable problem ..

edit
about 5 hours last i turn off my dns proxy DoH .. and all works with DoT perferct. i have problem with DoH dns .. SNI off..
Test Patch U12 is now available and this problem should be fixed. https://www.lfs.net/forum/thread/93185

Downloads should be a bit faster too.


Quote from WestlY :about 5 hours last i turn off my dns proxy DoH .. and all works with DoT perferct. i have problem with DoH dns .. SNI off..

That is interesting because it seems you can recreate the skin issue on your system.

I'd be interested to know if U12 fixes that for you.
2

FGED GREDG RDFGDR GSFDG