When we want to strip the name from an IS_MSO message, we can use TextStart, and this works fine... as long as the sender's name only contains latin characters. If I use Japanese characters in my name, for instance (and I know this is pretty common for teams and people to use Japanese katakana to stylize text, e.g. チSマ for FSR), then TextStart will have a greater value than it should.
Here's an example using "Cyk R", "Cyk マ" (half-width "ma") and "Cyk マ" (full-width "ma") as a nickname and sending a message containing only "test":
buffer[+TextStart] is just me cutting the TextStart first bytes and converting again, for reference, while substr(+TextStart) is how I assume TextStart is supposed to be used, by removing the TextStart first characters from the string.
A single Japanese character caused an offset of 2 characters in TextStart, maybe because ^J is ignored? Also worth noting that full-width Japanese characters cause an additional offset in TextStart, as seen in the 3rd test.
I think this means any codepage change in the player's name will cause TextStart to report erroneous values, and full-width Japanese characters (at the very least) also throw TextStart off. A name like 日本人じゃないけど will cut 11 characters from the message.
With that said, it is possible to use a regular expression to fulfill the same purpose without relying on TextStart, that's what the custom regex line shows in the above tests.
Here's an example using "Cyk R", "Cyk マ" (half-width "ma") and "Cyk マ" (full-width "ma") as a nickname and sending a message containing only "test":
full message: ^7Cyk R ^7: ^8test
to utf16(?): ^7Cyk R ^7^c ^8test
TextStart = 14
buffer[+TextStart]: R ^7: ^8test
substr(+TextStart): test
custom regex: test
full message: ^7Cyk マ ^7: ^8test
to utf16(?): ^7Cyk ^JÏ ^7^c ^8test
TextStart = 16
buffer[+TextStart]: Ï ^7: ^8test
substr(+TextStart): st
custom regex: test
full message: ^7Cyk マ ^7: ^8test
to utf16(?): ^7Cyk ^J荽 ^7^c ^8test
TextStart = 17
buffer[+TextStart]: } ^7: ^8test
substr(+TextStart): t
custom regex: test
buffer[+TextStart] is just me cutting the TextStart first bytes and converting again, for reference, while substr(+TextStart) is how I assume TextStart is supposed to be used, by removing the TextStart first characters from the string.
A single Japanese character caused an offset of 2 characters in TextStart, maybe because ^J is ignored? Also worth noting that full-width Japanese characters cause an additional offset in TextStart, as seen in the 3rd test.
I think this means any codepage change in the player's name will cause TextStart to report erroneous values, and full-width Japanese characters (at the very least) also throw TextStart off. A name like 日本人じゃないけど will cut 11 characters from the message.
With that said, it is possible to use a regular expression to fulfill the same purpose without relying on TextStart, that's what the custom regex line shows in the above tests.
\^7%s \^7: \^8 # Replace %s with player name
(?<!\\)\^ # Use this to escape non-escaped carets ^ in the player's name