Hi,
This post is meant for those people who grab LFS hostnames from either the pubstats or elsewhere and display them on their own webpages (or elsewhere).
Patch Q brings support for different codepages as you all probably know already. This also means that hostnames can use codepages other than the usual latin 1, so to display them correctly, you will have to support different codepages as well.
To help you guys out and save you some trouble, I'll explain here how you can display mixed codepages on webpages. I don't know how to display them in binary applications - my knowledge doesn't reach that far, but the same principle may apply just the same.
As you may know, LFS uses only 8bit character encoding and it can display multiple codepages on the same page. HTML however doesn't support using multiple 8bit codepages on one page, so the idea is to convert hostname characters into unicode values and display them through &#xxxxx;
Character conversion is done via 1 on 1 conversion tables. The following PHP example only converts the characters with a value higher than 127.
Find attached a zipped version of the "cp_unicode_tables.php" which contains the conversion tables needed for inclusion.
Good luck I hope it's explained a bit properly.
This post is meant for those people who grab LFS hostnames from either the pubstats or elsewhere and display them on their own webpages (or elsewhere).
Patch Q brings support for different codepages as you all probably know already. This also means that hostnames can use codepages other than the usual latin 1, so to display them correctly, you will have to support different codepages as well.
To help you guys out and save you some trouble, I'll explain here how you can display mixed codepages on webpages. I don't know how to display them in binary applications - my knowledge doesn't reach that far, but the same principle may apply just the same.
As you may know, LFS uses only 8bit character encoding and it can display multiple codepages on the same page. HTML however doesn't support using multiple 8bit codepages on one page, so the idea is to convert hostname characters into unicode values and display them through &#xxxxx;
Character conversion is done via 1 on 1 conversion tables. The following PHP example only converts the characters with a value higher than 127.
<?php
function codepage_convert ($str) {
global $cp_tables;
$newstr = "";
$current_cp = "L";
$len = strlen ($str);
for ($i=0; $i<$len; $i++) {
if ($str{$i} == "^" && is_array ($cp_tables[$str{$i+1}])) {
$i++;
$current_cp = $str{$i};
continue;
}
$decimal = ord ($str{$i});
if ($decimal > 127) $newstr .= sprintf ("&#%05d;", $cp_tables[$current_cp][$decimal]);
else $newstr .= $str{$i};
}
return $newstr;
}
// Code page indicators
// ^L = Latin 1
// ^G = Greek
// ^C = Cyrillic
// ^J = Japanese
// ^E = Central Europe
// ^T = Turkish
// ^B = Baltic
include ("cp_unicode_tables.php");
?>
Good luck I hope it's explained a bit properly.