TV version (Display Regular Site)

Skip to: Navigation | Content | Sidebar | Footer

Weblog Entry

Character Sets

May 23, 2003

It’s a thrilling experience to see your words translated into a foreign tongue, even more so when the language is completely unrecognizable to you. With the first Zen Garden translation (Greek, thanks to Akis Apostoliadis) just about ready for prime–time, and the second one (French, thanks to Nic Steenhout) under way, I’ve had to start thinking about foreign characters.

French is a snap, since all non–English entities like â, ç, ö and so forth have well–supported character entity codes, as well as even better–supported numeric equivalents. Most of the Germanic and other European languages are built in to modern operating systems so they’re easy to handle.

But Greek… that’s where I had no experience. Would I have to save the Garden as double–byte Unicode and double the file size? I’ve played with Unicode before and, while it displays wonderfully in browsers that support it, my current ASP setup is unable to process it. IIS is incapable of parsing Unicode files, which makes them completely useless for anyone running Windows. I’m finally moving the Garden to its own domain and switching to PHP in the next few days, but I have even less experience with Apache.

Long–term planning aside, I don’t have to deal with server–side concerns yet, at least. Greek is supported well enough that even though I couldn’t see the Greek characters as I edited, assigning my XHTML the proper character set renders a viewable Greek version of the Garden in even my English browser.

Though I believe I’ve incorrectly declared my XHTML character set. I used the old standby:

<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />

Which seems to work on the document level, but according to the crib notes I should also insist on an XML declaration:

<?xml version="1.0" encoding="EUC-JP"?>

This particular encoding is just the test example, but I’ve been running around in circles trying to find out what to substitute for ‘EUC-JP’ and let my XML know that, yeah, I would rather see Greek in there. There’s no reference or link in the XHTML spec, so I did my best to pull up information on XML, which makes me break out into a cold sweat. A fruitless search ended up pointing to the Unicode site, where I found a brief Web FAQ.

I’m still not seeing my answer, but maybe I’m not being observant enough. Is there a place where I can find out how to encode an XML document in the Greek language? I will have to keep looking this evening.

The other catch, too, is that if I re–insert that XML declaration, IE6 gets thrown into quirks mode. I suppose this isn’t terrible since box model hacks abound in the Garden, thanks to IE5. In a few years when IE5 is no longer an issue though, we might be re–visiting this one.

Reader Comments

digital.death says:
May 23, 05h

I believe you would use:
<?xml version=”1.0” encoding=”iso-8859-7”?>

May 23, 05h

I agree. EUC-JP is just a strange sounding encoding for Japanese; you can replace it with iso-8859-7 for Greek text.

I’d also use:
<html xmlns=”” xml:lang=”el” >
This “el” comes from the Greek word for Greek, “Ellinika”.

There are still some problems with unclosed spans and some acronym tags seems to be placed wrong.

Else it is a nice translation of a great project!

Dave S. says:
May 23, 06h

Thanks guys, I’m using your advice. That iso-8859-7 is a real head-slapper, now that you point it out. Funny what a bit of lateral thinking can do for you.

The validation errors were purely my own, but they’re fixed. Now I just have to figure out how to link this in to the rest of the Garden…