This thread is over 1 year old but I have found a working .Net Solution for the bullet issue.
To Summarize the problem.
Data is received in iso-8859-1 format. This format has special characters that do not display correctly when displayed charset is utf-8 (subset of unicode I believe).
When converting from iso-8859-1 to utf-8 the characters end up not displaying correctly but are converted correctly.
Encoding iso8859 = Encoding.GetEncoding("iso-8859-1"); Encoding unicode = Encoding.Unicode; byte[] srcTextBytes = iso8859.GetBytes(textToConvert); byte[] destTextBytes = Encoding.Convert(iso8859,unicode, srcTextBytes); char[] destChars = new char[unicode.GetCharCount(destTextBytes, 0, destTextBytes.Length)]; unicode.GetChars(destTextBytes, 0, destTextBytes.Length, destChars, 0);
This code will convert the bullet to unicode.
As pointed out by G. Dierckx:
Quote : ps: I also found something (wich isn't totally related to the problem) but the "bullet" in lfs..
I retrieve from hostprogress
unicode 0095 -> http://www.fileformat.info/info/unic...0095/index.htm
While a "real" bullet should be: 2022 -> http://www.fileformat.info/info/unic...2022/index.htm
u\0095 is one of the correct unicode characters for the bullet so is 149;
The html entity is • (•)
Using a snippet from another board on
manually converting special characters to html_entities I grabbed:
StringBuilder result = new StringBuilder(textToConvert.Length + (int)(textToConvert.Length * 0.1)); foreach (char c in destChars) { int value = Convert.ToInt32(c); if (value > 127) result.AppendFormat("&#{0};", value); else result.Append(c); } return result.ToString();
Which gave me this function:
publicstatic string iso8859ToUnicode(string textToConvert) { Encoding iso8859 = Encoding.GetEncoding("iso-8859-1"); Encoding unicode = Encoding.Unicode; byte[] srcTextBytes = iso8859.GetBytes(textToConvert); byte[] destTextBytes = Encoding.Convert(iso8859,unicode, srcTextBytes); char[] destChars = new char[unicode.GetCharCount(destTextBytes, 0, destTextBytes.Length)]; unicode.GetChars(destTextBytes, 0, destTextBytes.Length, destChars, 0); StringBuilder result = new StringBuilder(textToConvert.Length + (int)(textToConvert.Length * 0.1)); foreach (char c in destChars) { int value = Convert.ToInt32(c); if (value > 127) result.AppendFormat("&#{0};", value); else result.Append(c); } return result.ToString(); }
This successfully converted my • in iso-8859-1 to • for displaying in utf-8.
I have not done extensive testing yet so I do not know if this will work for all cases.