It started to go really bad.. Loads of people with unexplainable loss of connection, people having problems to join the server etcetc ... Somehow it takes some hours (buffer leak?) but it didnt lead to LFS client crashes yet. But with the news of the new test patch I leave all this for now and do some testing again after the release of that..
I don't think there is a clear limit on how many buttons should be sent. The insim programmer should indeed spread it out a little so there isn't too much of an overload within a short time (e.g. sending a screenful of buttons to all guests at once in a single burst).
BUT the report is not just that the connection can be overloaded, but that the LFS clients can crash. A crash is not acceptable even if an external program is causing the overload. The crash address or a simple method to cause a crash would help me track it down.
What is making the TCP buffer limit then? I increased the OS buffer limits a long time ago. Messed around with it but that didn't changed anything.. I played with those buffer settings because of the somewhat annoying TCP WOULDBLOCK message which I don't like in the logs
---
Back to the sending of too much buttons at once. Chuck I know that (now) and I was forcing it this weekend while the server was very busy.
Normally it goes like this;
1) sending massive amount of buttons at a very high rate to everyone (its the custom race result ending "screen")
2) some people loose connection
3) network debug bar raises to the red for all connected clients and stays red for many many many minutes even after race restart
4) new massive flow of buttons
5) some others loose connection
6) ... (this situation repeats itself for some time)
7) connected LFS clients start to crash (LFS stopped working), especially the connections which did not lost the connection and did not reconnect
8) I did not managed to reproduce 7 yet, so sadly no crash address. And because its a little bit of a disturbance on the servers I am not so motivated to test this that long as I did before (when I did not know what the cause was).
That number 3 in your list above is the sort of thing that should be impossible with the new version.
So it shouldn't get worse and worse like that. If you would do a button flood, that could cause loss of connections (I guess) but the host and anyone who did not get disconnected should then recover. Because the packet buffers are done a completely different way now.
While designing the code for the lag info reporting, I thought of a simple TCP send buffer to reduce the number of TCP packets (and therefore bandwidth as well). It's just a way to wait for a tiny fraction of a second before sending a packet to a guest, when in fact another one will be following it immediately. Might as well stick the second one on the end of the first one and send them with a single send. This should help reduce the number of physical TCP packets sent and reduce the TCP bandwidth. It's a bit like what the Nagle algorithm does but under control. All buffered packets will be sent after each processing frame, every 100th of a second if there are any to be sent, so the maximum wait for this would be 0.01 sec. Usually it should run faster though... it's more like the old way is "wrong" and this will be the right way to do it. Note that I am not talking about the UDP packets which are the most time critical.
It then occurred to me that there are a lot of situations where this will help, specially when a host is busy and packets are flying around all over the place. But most particularly, when an InSim button was sent to all guests, this button was sent by the LFS host in a TCP packet to each guest. That's fine, but when another button was sent immediately afterwards, this was also sent individually to each guest in turn. Using the new send buffer, several buttons for each guest will be stored in a buffer and sent in batches to each guest one by one. So for example, 10 buttons to 20 guests, in the old version would be sent in 200 physical TCP packets. In the new version it would be sent to the guests in only 20 packets.
I also spotted a possible reason for the reported disconnections and crashes. Even in B8 they might be forced to disconnect but probably not crash. This will be automatically fixed as well by the new buffer.
Sounds nice so far, I hope you're not running in problems with fragmentation.
We're already up for a test
Some notes on Insim implementation (might not be that relevant for cargame but other cruisers). We store hash values per player per button, so only real updates are physically sent to the client. Some buttons are veeery static and don't need to be sent on each cycle. Unless the player presses shift+i, of course, which clears that buffer. (That logic would also fit well into the actual lfs server)
No, I have a performance table at the end of every race. although nobody really understands what it is yet (its a variant of elo chess rating without the loosing bit like in Trackmania), it's being send 40 seconds before race restart. In the current situation I cant customize that for everyone because it leads to 32 x (32x4)= 4096 TCP packets (rougly) .. As I understand now it only leads to 32 TCP packets... About the same I think as I'm doing now by announcing just a global array of data for everyone (or maybe even less then what I currently transmit with this 'buttonforeveryone' InSim option).
---
I understand some more optimizations are going on, which is fine
If I understand you (correct me if I'm wrong) each client - assuming 32 clients - sees 32 rows of 4 buttons before a race start?
If so, the server still sends all 4096 packets to clients even if you're sending the (32x4) buttons once to the global UCID=255, as the server still has to send separately to each client - even if the different clients are receiving the same button ID+Content.
Each single button sent to UCID=255 actually sends 32 packets total (or whatever the current number of connections - could be up to 47).
If you were to make each of those 4096 buttons unique, there would only be extra packets sent InSim -> Server, but the same number Server -> Client.
Example:
(32x4) buttons sent to UCID=255
128 packets InSim -> Server
128 packets Server -> each Client
4096 packets Server -> Clients in total.
(32x4) buttons sent to each client in turn
4096 packets InSim -> Server
4096 packets Server -> Clients in total.
Assuming the InSim is local to the server, the number of packets InSim <-> Server should be a lot less sensitive than Server <-> Client as latency is near zero, so AFAIK it shouldn't matter much if you're sending 128 or 4096 packets locally.
The above is the case in the current 0.6B version, of course the new patch could vastly reduce the number of packets in all cases - except a single button (global or unique) sent to all clients, which would still be 1 packet per client.
tl;dr
In the current version, it doesn't matter whether you're sending a global or unique button to all clients, the server still sends out the same number of packets:
Total packets = buttons per client * number of clients
It should be a lot more difficult to lose connections to guests by sending a lot of small InSim buttons. The new buffer and error correction is a lot better... there were definite problems with the old one that could come up if you saw WOULDBLOCK.
I'm sure you will cause disconnections if you really send several KB of buttons to each guest all at once, but I think the buffer should deal with sensible button sending. You might see WOULDBLOCK if you send too many buttons at once but it should recover safely.