I´ve got a funny problem in online-racing. Not seldom, all others won´t see my car. If they switch to me - i´m in garage, while i´m still racing...
- Ping to servers are ok (dedicated and nondedicated)
- no lags
- connection-bars left down are ok
- i can chat with others
- they will see my finish time
- in replay i stuck at garage or at a place on track
- for me, it´s a normal race, seeing everything i should
- happened with older patches as well
I´m in "TCP-Mode", because i need CTRL-T to switch my VR on.
Most times it happened in Team-Races:
- join server (qualy-mode)
- box; race; box etc.
- restart server (like new track, new settings)
- i´m in race like allways, but noone can see me (i´m on pos. 1 on their list)
- left server and rejoin -> all is ok again
I think it's time to revive this thread. Since the appearing of the "AutoX Compo Test Server", i found new useful informations to help scawen dig into this problem and see that it's not an ISP bug.
In the following link there are my considerations about this bug: http://www.lfsforum.net/showthread.php?p=174173#post174173
since i experimented this problem on autox server, it is clear this is not an ISP bug accordingly to: LFS network debug, 100% ping succesful, 100% traceroute succesful.
what autox server made me realize is that servers with InSim applications are more prone to this problem, in other words the problem is in lfs somewhere, but InSim seem to stress it more and make it more frequent (the autox server is the worst in this regards).
so, since then i went around lfs servers and test my hypotesis:
- on regular servers (i.e. without InSim applications) it never happened till now...but i remember in the past to be happened
- on servers with InSim applications it is more frequent and the more InSim messages, the more frequent the problem is...
- the problem is triggered by spectate/join actions, i.e. if i connect to a server and i never go to spectate, the problem seem to never arise!!! (<- important!!!)
- my guesses are that when under udp flood some lfs queue of packets overflows and it misses to process some udp from clients...this is is irrilevant for udp position, but missing the udp command for "join to race" causes this problem. Another possibility is that the server is running on a windows pc with unpatched tcp.sys that limits connections, in that case the problem would be of windows, but i don't think it is the case since the lfs network debug clearly shows that udp packets arrive correctly to lfs!!!
so to summarize:
- udp packets arrive to lfs according to lfs network debug, ping and traceroute (-> verified!)
- "invisible car bug" is triggered by spectate/join sequences of actions
- "invisible car bug" is more frequent on servers with InSim applications
to all people that experienced this bug (epsecially almost everyone that tryied the Autox compo test server experienced this at least once...), please report it because scawen need a statistical proof that this is an lfs bug.
I'm subscribing to this thread so you can discuss it here when you get more info, and I'll always read your comments. I'll take a look at this problem, this morning, and see if I can make any sense out of it.
to everyone:please report if the server has any insim application, i think it will be important to know if it is:
- a discriminant,
- a contributing factor,
- or it is irrilevant
...to what i experienced til now the second point seem to be true.
spectate/join actions trigger this, disconnect/reconnect fixes it.
I've seen this only once and it was with 99% certainty a 'regular' server. Was some months ago, so cannot remember which one. I thought it was lag, so didn't pay so much attension (but it didn't print lag texts so it wasn't).
Hmm.. maybe. I've tested my insim app on dedicated and non-dedicated host long and haven't seen there. Nobody complained me about this, but cannot be sure if nobody else wouldn't have experienced this. It's been running on dedicated now for few months and admin hasn't told me any problems. But, of course this doesn't prove anything.
To be specific, if the InSim app sends much messages it is bad, not that if server sends them to app? Or is both bad?
About the last point("...never go to spectate, the problem seem to never arise"): Interesting, but how to prove that, unless you know what causes it and you can track the logic out of the code? Otherwise you can only prove it wrong (by finding a situation where it does not hold). Could be true but hard to prove.
Sounds a good candidate for the cause. But doesn't server ACK these packets (im not sure)? Of course that's not 100% sure way (if all resends get lost too, but the client should disconnect then IMHO). Maybe it could be tested by building a 'box'(relay) which looses some of the packets to LFS (like those where player joins a race).
If they wouldn't how could you even join the server? Or do you mean that you were able to reproduce it and netstat showed that correct packets arrive even when the car goes into 'invisible mode'? Yes, that would prove it's a server side issue (like a queue overfow).
When you enter a server you are in a spectate mode and then you just click "join race" to join it. So isn't this what everybody needs to do? I guess you mean that after joining i go back to spectate and then joining again and maybe repeating this multiple times, right? But basicly this could happen even if you don't jump constantlty between spectate and race states (the first 'join' packet gets lost).
I'm as eager as anybody else to find the cause. Especially if InSim app coders need to do something to less burden the servers.
it is a very subtle bug, i started to be aware of this only because in the old days someone tried to spectate me, couldn't see me and told me exactly how it was, he didn't make it simple by saying just:"you are lagging" in fact it is not lagging
i think it is bad when on server is enabled insim, on client is irrilevant (actually i don't use insim apps)
the reason underneth this could be of any kind, some examples that come to my mind right now:
1-i connect and join->lfs adds an element to the vector/array of clients connected and in race, let's assume lfs handle this array in its way and not with stl containers or whatever, IF removing elements from this array is not "thread safe", it may happen that two thread may modified it asynchronously and thus having unpredictable results i.e. adding an element (rejoin race) could fail.
another possibility is if lfs uses timer threads to handle/poll/check some things related to joined racers or semaphores/mutexes are used some events may be lost timers will timeout or thread synchronization is lost by an unupdated semaphore by a timer timeout.
let's not mention compiler bug...here possibilitoes are endless, scawen uses VC which is pretty not standard and is not rare to build perfectly correct code to crash...i have somewhere a sample app (less than 20 lines) that crashes if compiled with VC 2005 because for no reason allocated pointers are freed , not to mention that it is perfect if compiled with mingw-gcc and borland compiler 5.5
i don't really think this could be the cause because join/spectate are handled by tcp connections and car position by udp packets...lfs has (i guess) some timerthread to check the lag of each client and if it is too high it disconnect you, thus according to this and to lfs-network-debug udp position packets are received by lfs and "recognised" it is just that the server ignore them, maybe because of one of the guesse i made above
as i said before lfs-network-debug is enough for this, because it proves upd position packets arrive and is of course a section of code called just very soon as the serversocket callback function is called, the handling of the packets is logically after the network debug code, so the problem have to be in the middle
it never happened to me that the first join didn't work, except on the autox compo test, again a leak in the handling of the array of joined people is a good candidate to be the cause
i don't have anything against insim apps, i find some server apps quite useful, i just guess that having insim enable on the server may "stress" more some parts of the code and make some sort of avalanche effect
You seem to have more information about the LFS server code. Since i don't have access to it (or have any information about it), this is more like blind guessing.
"unpredictable" indeed. Often they die a horrible death.
With TCP, okay i see. I got idea from your earlier posts it's UDP too. Like i said, this is guessing when don't know even the facts about the code.
You only said that spectate/join causes it, so i suppose it can happen from the first join operation then too. I never played the autox test thingie and just here trying to follow your thinking
I've only noticed that when somebody joins a race it causes a small break when game freezes for a fraction of a second. No idea what it is doing then.
But since it happens in servers which do not use Insim apps, it cannot be because of them (even it might make it happen more often). I actually wonder do they burden the server much more than 1 guest player.
i don't have any access to lfs code i'm just guessing using my intuition and experience with the hope that maybe the ideas that come out of this thread could be somewhat pointing in the right direction...let's just remember that until we find a way to make this problem quite reproduceable, scawen cannot do a miracle, this is obviously one of the most subtle bugs and hard to investigate
yeah i noticed that too and that is the thing that made the thing about semaphores come to my mind.
i don't know if it would be too time consuming for devs but i think that if could be possible for scawen to compile a debug version of server in "verbose mode" to be used for autox compo test i put myself at disposal to make stress tests on such server...hopefully with verbose logs, investigation may be a bit easier.
Important question : Is that only on the OTHER players MPR, or even in the replay you save on your own computer? Is the replay in your second post recorded on your own computer or someone else's?
What does that mean? What is VR? [ EDIT : OK I checked your other posts. I think I understand, you have to press CTRL+T for the VR, and unintentionally this turns on the TCP packets. But anyway that probably isn't affecting the invisible car problem ]
Which UDP packets arrive at which instance of LFS? I don't know what you have verified - you have verified that UDP packets arrive at the guest? Or you have verified with 100% certainty that the PosPackets from the invisible guest are arriving at the host?
I am working on this question at the moment, that's why I've asked GAS-Hugo about the MPR in my previous post. If the player whose car becomes invisible to the other players, saves a MPR on his own computer, and even in that locally recorded replay, he can't see his own car, that means his computer is not even trying to send PosPackets! However, if he CAN see his car in the MPR saved on hos local machine, that means the PosPackets are being prepared but either they are failing to be sent or the host is not receiving them or not forwarding them.
Could you dig up the one mpr, (FOX at BL1 at OCC server if I recall correctly). The one where your car was invisible. I could find my replay of the event as well, so Sawen might want to look at them?
I'll go find the my mpr version now...
EDIT: found it. Recorded on 14.7 and the original name of the file was BL1R_race_5L_5r_5F_3.mpr. BL1R with the LRF class cars. In my replay you can see no Honey until he reconnected.
EDIT2: send PM to Honey, hope she bothers to answer
EDIT3: She did not have the replay. But anyway, in the replay attached you see the whole thing from my perspective. The invisible ca bug has never happened to me though. Maybe it has something to do with the fact that the players who usually suffer from this are enetering the race from pits right before the race starts...?
when happened to me, on mprs i saved i could see myself, unfortunately i think i recently purged my mpr folder when i installed last patch, i'll promise to provide you an mpr tonight or tomorrow
when i enable the lfs network debug feature, it shows me the "ping/lag" of myself i assumed (by logic) that this means server has acknowledged my udp pospackets...so to be clear, it seems to me that udp pospackets arrive correctly from my client to the server, they surely are sent as my firewall log the rules that checks the lfs udp output (in the log there is the server at which i am connected as endpoint), of course i cannot see the content of the packets but i assume that are pospackets because the flux of bits transmitted is continous.
by means of windows ping and tracert commands i ensured there was no packet drop between me (client) and server host.
so if lfs ntework debug actually acknowledges pospackets, then the situation is the following:
the client affected by "invisible car issue" correctly sends pospackets (let's call this client1), server receives pospackets from it BUT don't "relay" pospackets of client1 to any other client...it just keep them for itself...
so the problem seem to be in the "server code". the only possibility that the problem is in the client is that possibly for some reason the client sends malformed pospackets and server rejects them mfrom processing after the acknowledge code.
i promise asap a couple of mpr from both views: invisible client, other client
@Hyperactive
unfortunately i don't have it it would have been a good thing to have both, but tonight or tomorrow, i'll try to arrange that with someone's collaboration when it will happen again
[ TECHNICAL POST WARNING - only read if you are interested in UDP protocol ]
If we find that the car does not vanish in a replay saved on the computer of the player whose car vanished on the other people's computers, then I am starting to suspect an issue with the ephemeral port number. I've just been reading up about how these work because until today I didn't know the name for them and my knowledge of them was quite vague.
An ephemeral port number is a port number assigned by the operating system when you make a TCP connection and also when you send UDP packets (I guess in the UDP case, because it is connectionless, this is when the first packet is sent). It is the port that the packet is "from" and the host can reply to that port number and it will be routed to the computer on your local network that sent the original packet. The router can also play around with the ephemeral port numbers for reasons of avoiding conflicts.
LFS servers assume that the ephemeral port number for any particular guest will never change, after the initial connection. It ignores incoming packets from unknown ports. I think I need to insert some code that, instead of ignoring those UDP packets, it could compare them with its known connections and see if a guest with the same IP address has recently appeared to stop sending any packets. In which case it can strongly suspect that that guest's epemeral port number has changed. By examining the data inside the packets it could verify that the correct car index is included and at that point, reassign the "known" port number for that guest, to the new number.
Aha!I´ve got a funny online-racing problem. Not always, all others can't hear me! If they switch to me - i´m in a garage somewhere, while i´m still racing...
- Ping to servers are okeleydokely (dedicated and nondedicated)
- no slags
- correction-bars down left , ok
- i can only chat with others
- they will hear finish time
- in replay I stuck a garage at a place on track
- for me, it´s a normal race, hearing everything I should
- happened with older pentioners as well
I´m in "thienylcyclohexylpiperidine-Mode", because I need CTRL-T to hear my VR on.
Most times it happened in sTeam-Races:
- join server (qually-mode)
- box; race; box etc.
- restart server (like new track, new settings like)
- i´m in race like allwayslike, but noone can hear me (i´m on pos. 1 on their list)
- left server and rejoin -> I can hear again
i understand, you use the ephemeral port as sorta of "client id" so to save pospackets size small and have efficent bandwidth usage, surely there are robust solutions, but the first that come to my mind right now, could be to send via the tcp connection from client to server the ephemeral port change...i make myself clear: the client periodically check (i.e. each 5 seconds) its ephemeral port, if it has changed it sends a "ephemeral port changed" event trhought the tcp connection to the server...dunno if that may be useful.
...so a t this point maybe we have to hope that the cause is really (as it seems to be) the ephemeral port change, i guess this would be less time consuming than any other possible cause
No. Because the guest does not actually know the ephemeral port. I wouldn't know how to access it. Maybe that's possible within the local OS but even so, it's still useless because the hardware *router* can change the ephemeral port. That is to allow two computers on the same network to use the same ephemeral port (one computer doesn't cooperate with another to find unused ephemeral ports). The router stores a look up table and repairs the reply packets before routing them to their destination. The result is that the ephemeral port that the server will see, is not necessarily seen by your client.
I'm happy enough with my "deduction" solution and it's 100% compatible so I will implement it anyway.
Well it's a bit of a guess. Your evidence is also suggesting it's at the server end, because your own replays are saved correctly, and you report that the UDP packets are sent correctly.
One further piece of evidence that would strengthen this theory, is if spectating and joining does not help in any way, and the ONLY way to become visible again is to leave the host and rejoin.
Very interesting idea. But why some people claim that it would happen more often if running InSim apps? In this case they shouldn't make it any worse, I think.
@CSU1
"- no slags" - Hmmm... Is this a big problem for you?
of course you knwo better than anyone your code and we know your solutions are always at state of the art, i'm just happy you found the cause
btw accessing/quering ephemeral port depends on the framework/api you use, but it's not a too hidden info...at least a netstat call would give you the answer, themost common name for ephemeral ports is "local endpoint", i remember winsocks to have it, for sure .net gives that info (but we don't care about .net), iirc in winsocks was quite hidden how to retrive it.
exactly
maybe coincidence, maybe since insim apps are quite "verbose" that could make the client os fall in the conditions to change the ephemeral port more easily...that does not surprises me so much even thou it's not completely clear why.