We recently internationalized our application and out requirement was to send SMS messages in different languages. SMS supports GSM 03.38 7 bit encoding as well as you can send messages using UTF-16 for characters that you can’t represent using the pseudo-ascii.
Our messages come in and have to be dispersed on the fly, though although sometimes a message is meant to be in ascii, it contains enough data in there in say Japanese, that would require it to be encoded in UTF-16 to make any sense.
The solution is pretty straightforward. Below is a java code snippet that first checks to see if the message is encodable as ISO-8859-1 and if so, transliterates the message to the GSM 03.38 and strips out any characters that are left out and didn’t transliterate properly.
Of course, there are other things that need to happen that I didn’t include, like trimming the string to 140 characters. For the UTF-16 hex string, it allows 280 hex characters, since characters are represented in either one or two byte encoding.