TV version (Display Regular Site)

Skip to: Navigation | Content | Sidebar | Footer

Weblog Entry


July 25, 2005

Oh, the fun to be had with Unicode.

Unicode, for those still unfamiliar, is a universal character encoding standard, jointly developed by a consortium with dozens of corporate and individual members around the world. The Unicode character set currently tops out at over 70,000 characters, and contains character sets from around the world, in both modern and ancient forms.

A bit more background follows, and then some usage analysis. Interspersed throughout this article are various Unicode characters; it’s highly unlikely that your browser/OS knows what to do with all of them. So unless you’re running Safari, click on any set to view an image-based equivalent (which is a screenshot of how they actually do render in Safari).

Pictographs and Icons: ☣ ✈ ☎ ☠ and ♼

Although Unicode is meant for characters, the term ‘character’ appears to be used rather loosely. The scope of Unicode is quite broad:

Unicode covers all the characters for all the writing systems of the world, modern and ancient. It also includes technical symbols, punctuations, and many other characters used in writing text. The Unicode Standard is intended to support the needs of all types of users, whether in business or academia, using mainstream or minority scripts.

Ѭ ش

Intricate non-Latin characters: Ѭ ش だ ༃ and ௵

If you’ve never opened a character selection utility, there are a whole bunch of goodies waiting to be discovered. Both Mac OS X and Windows have useful utilities that allow you to get at the extended characters you won’t find on the average keyboard. Windows has the Character Map (All Programs ➝ Accessories ➝ System Tools ➝ Character Map) while OS X has the more functional and comprehensive Character Palette (System Preferences ➝ International ➝ Input Menu ➝ Character Palette). Both of these will enable you to browse around the higher Unicode characters that one normally can’t get at directly from the average keyboard.

Strange decoration: ℄ ❣ ᴞ ℟ and ℋ

Given the current state of internationalization amongst modern operating systems and browsers, we’re coming close to actually being able to use some of these advanced characters on the web. The issues are similar to those of specfying typefaces for a browser: the user’s system must be able to render the character in question.

The biggest advantage I can see, aside from the obvious foreign language support, is the ability to include pictographs and icons within a document, without having to send an image over the wire. A little CSS styling means they can be set to any size, without consuming any more bandwidth, so the visual opportunities are promising.

Though, the biggest caveat is that any character you choose must have a representative glyph (the character pictograph itself) in the font you’re displaying it with. It’s unlikely that we’ll ever be able to rely on a specific glyph/font combination being present on the computers of every user. It’s also pretty much a given that the average typeface simply won’t have glyphs for all 70,000+ characters, considering the work that goes into producing even a basic latin character set. We’re lucky, though, that most of the standard web sets — Verdana, Georgia, Arial, etc. — have a larger number of glyphs designed than an average font (thank you, Microsoft).

Religious and symbolic glyphs: ☦ ☬ ☤ ☮ and ☭

Browsers are not equal in their rendering, unfortunately. This article was composed while testing in Safari, and looks absolutely great. Any other browser, including Firefox, doesn’t. I can’t even begin to speculate why this is, as I had assumed Unicode character assignment occured on the operating system level. Suffice it to say, all browsers are not equal when it comes to rendering advanced Unicode characters.

Although it’s probably easier and more reliable to embed these in an image for now, if you pay attention to the character sets and font combinations, it’s possible to render them as text within a browser. Once you have a Unicode number (which you can find in your character viewer of choice) you’ve got enough to start playing. There are at least three ways to include them as character data, and likely more:

  1. As a literal character, copied and pasted into your markup from the character viewer. Make sure your site uses a Unicode encoding like UTF-8. (Although this is true for all three options in this list.)
  2. As an escaped character entity in your markup, using the &#xNNNN; format (for example, ›) where NNNN is the Unicode number (can be less than 4 digits, e.g. ·) (Unicode Character Lookup)
  3. As generated content via your CSS, using the :before and :after pseudoselectors and the "\NNNN" format. (eg. li:before {content: "\203A";}

Vaguely meteorological glyphs: ☼ ☁ ☂ ✮ and ☃

But it can be hit and miss since so many variables are involved. Still, the opportunity to experiment is there, and you may have noticed certain Unicode characters showing up in the wild — from the mundane © copyright symbols all over the web, to accents like John Gruber’s stars on the Daring Fireball Linked List (they appear after each link).

It might be nice to have a reference sheet with a bunch of reliable cross-browser glyphs that work today, were anyone up to experimenting. Otherwise, consider all this a simple look at the possibilities of what we might get to play with in a few years, if all the major browsers play ball. I’ll leave you with a couple of enhanced, gracefully-degrading generated content examples (screenshot here) to show how you might consider using these characters in a non-harmful way.

  • Mercury
  • Venus
  • Earth
  • Mars
  • Jupiter
  • Saturn
  • Uranus
  • Neptune
  • Pluto

The Road to Enlightenment

Littering a dark and dreary road lay the past relics of browser-specific tags, incompatible DOMs, and broken CSS support.

Update: It sounds like more than just Safari is doing justice to this page. As it’s obviously not purely a browser issue, there are a number of reasons why your setup differs from mine, and why my copy of Firefox renders it differently than yours, which renders it differently than hers, which renders it… you get the idea. OS version, installed fonts, and international language options are all factors here. So, if you see this page as intended, great!

Kevin says:
July 25, 02h

You say that this gracefully degrades, but Firefox renders question marks for all the symbols it can’t figure out. Not so graceful looking to me.

nortypig says:
July 25, 02h

Great article only I’m wondering if it’s wider usefulness will be limited for some time to come simply as you can’t tell if your reader will get the symbols or the square. Language after all is about communication.

Although, don’t get me wrong, the applications of delivering to different languages is good stuff. I just mean it may still be quite limited in it’s usefulness to the average joe. Hardly what I would use to put sexy bullets on my site, for example. As far as I can tell anyway.

July 25, 03h

This page is a handy guide to some of the more mundane Unicode characters that are pretty much guaranteed to work in Windows.

This is also a pretty good online character chart:

(more useful than the authoritative ones at because those are only available as PDFs)

Lastly, I hope it isn’t considered spam to link to my own blog here, but I did make a little project of organizing all the Unicode characters that correspond to characters of the IMB PC “OEM” character set back into their original codepage order:

Paul D says:
July 25, 05h

Between Lucida Grande and Arial Unicode, both Mac and Windows users should have pretty good Unicode glyph coverage. Linux is lagging in this area.

For the browser’s part, the correct behaviour, whenever encountering a glyph not included in the CSS-specified font, should be to render it in a suitable alternate font. Obviously, that requires the browser to know which fonts make which glyphs and character sets available. Safari seems to shine here, while Firefox and Camino have mixed success. I’ve experienced a lot of frustration trying to make Firefox on Windows show certain characters I knew were present in Arial Unicode font.

ArcticBear says:
July 25, 05h

I get no “question mark” characters in iCab 3 beta 319 (on Mac OS X 10.3).

Sure it’s not finished and not generally available yet (at least to non-paying customers), but it’s coming along nicely.

Paul D says:
July 25, 06h

Between Lucida Grande and Arial Unicode, both Mac and Windows users should have pretty good Unicode glyph coverage. Linux is lagging in this area.

For the browser’s part, the correct behaviour, whenever encountering a glyph not included in the CSS-specified font, should be to render it in a suitable alternate font. Obviously, that requires the browser to know which fonts make which glyphs and character sets available. Safari seems to shine here, while Firefox and Camino have mixed success. I’ve experienced a lot of frustration trying to make Firefox on Windows show certain characters I knew were present in Arial Unicode font.

Thomas Passin says:
July 25, 10h

They all looked good on my browser - Firefox 1.0.4, Windows 2000 SP4+. I did notice that the browser used utf-8 (as it would have had to, to show the glyphs).

Kelson says:
July 25, 10h

Firefox on Windows seems to do pretty well. Only three glyphs seem to be missing: The recycle symbol, the fifth “non-latin” character, and the sideways ü. Opera on the same box is also missing non-latin #4, and IE has less than half. It doesn’t even manage the arrows in the text (though I’ve used the same character—I think—with a numeric entity). And the alchemy symbols? Not a chance!

Linux is a bit more dodgy, but that’s at least partly due to fonts. I’ve got a lot of symbols that show up small and/or blocky in FF and Opera, and not at all in Konqueror, so I suspect I only have them in bitmap fonts. I’ll have to install some missing fonts or try it on another box.

July 25, 10h

It’s funny… with Firefox 1.0.6 on WinXP SP2, I only manage to correctly render 8 of the glyphs properly. But that could be because I don’t have the proper fonts loaded into my Suitcase…

On OSX 10.3.9, Safari looks great as you already said, and Firefox 1.0.6 looks fairly decent, except for a couple of characters.

I look forward to greater support for Unicode in the future.

July 25, 10h

It’s an issue of browser combined with OS combined with the fonts you have installed.

On the PC I’m on right now, I don’t have many fonts installed at all. As a result, I get only 9 glyphs and the Mars and Venus (male/female) symbols. Funny, those last two.

If I’d install Japanese/Asian Languages on this PC, I bet I would see a few more, perhaps even a lot more.

Safari hooks up right into the OS, if I recall correctly. That would be why it shows everything for you whereas your Firefox doesn’t. Firefox uses its own font interpreters I believe, and may have a limited amount of fallback-attempts (unlike an OS itself) before giving up.

Mind you, my knowledge of the technical implementations of either browsers is really quite limited, but this is my guess anyway.

Kelson says:
July 25, 10h

Aha! I just copied the Arial Unicode font from the Windows box to the Linux box, and now I get everything but the sideways ü on both FF and Opera, though the recycle symbol is still pulled out of some bitmap font and Opera seems to prefer the bitmap of the hirigana(?) symbol in line 2 to the vector that Firefox picks. I can’t get Konqueror to pick up the new characters, and I have way too much running on this box to lock out and back in.

Andrew says:
July 25, 10h

“OS X has the more functional and comprehensive Character Palette.”

…which is generally easily available within most apps under Cmd+T, or the Edit menu as “Show Character Palette.” No need to dig through System Preferences to see it.

Thomas says:
July 25, 11h

They all seem to show up fine in Panther, both in Safari and Firefox. I’m running Mac OS 10.3.9 with Firefox 1.0.6 & Safari 1.3.

Thomas says:
July 25, 11h

Oops, scratch that… looks like the Phone and the Recycle icon aren’t showing up correctly, and the fifth icon from the left in the “Intricate non-Latin characters” section.

Hope that’s helpful.

Clay says:
July 25, 11h

I’m running Firefox on Windows, as well. I see all characters, but the recycling symbol and the last two non-latin characters. Those are some sweet characters, though, on the screenshot. Sometimes I wish English looked like that.

Hsiu-Fan says:
July 25, 11h

This may sound a bit nitpicky but I really dislike the use of Chinese characters as “glyphs” or as little decorations.

As someone able to actually read Chinese, the paragraph with the character for love really really threw me off. For one thing, I was reading things that were layered on top of one another, the two were still understandable, but I failed to see how the character was related to the English text and ended up somewhat confused.

Just wanted to register my disapproval with the appropriation of a language into just “something that looks pretty” (never mind tattoos with Chinese characters…)

Gilles says:
July 25, 11h

OT: I think there’s a problem in the source of the page. I see:

< ! & # 8 2 1 2 ; spans embedded in an a […]
so there you go. & # 8 2 1 2 ; >

Some filters is rewriting the < ! - - and - - > here.

Otherwise, great article, as always.

July 25, 11h

I’ve long been a Unicode fanboy, and it always makes me smile to see others I respect singing its praises as well. People often challenge me to explain the usefulness of having Unicode around, and I will add this article to my list of support. To prevent this post from being totally useless, I share this page I created as further demonstration:

Here is a chessboard made of text and CSS. Use your browser’s font size bigger/smaller feature to watch it scale (Cmd+ and Cmd- in Safari, for instance).

Dave S. says:
July 25, 12h

Hsiu-Fan – no, you’re not being nitpicky at all. I’m a big fan of so I’m aware of how silly it looks for an English speaker to use Chinese characters out of context. (Though, to be fair, it’s my understanding it works both ways, and non-English speaking Asian designers are known to use English phrases for the same reasons [ie. “It looks cool”])

But you might be walking away with the wrong interpretation here. The paragraph you’re referring to is taken from, and the character is the same as the favicon for that site. So the connection isn’t exactly obvious in this context, but there was a purposeful link between the two. (Of course, since I was the guy who created the favicon in the first place, you could argue it’s the original usage that’s flawed…)

tre says:
July 25, 12h

we all know that dave’s a lover, not a figher. so in that context, it’s entirely appropriate.

rsrr says:
July 25, 12h

It shows up fine on IE6/XP, just don’t understand why they devised so many codes for all the same squares?!

Tini says:
July 26, 01h

“The project [..] has the objectives
of creating a basis for fundamental typographic research and facilitating a textual approach to the
characters of the world for all computer users”

Zach says:
July 26, 07h

I am using a freshly installed copy of Windows XP 2005 MCE, and Firefox 1.0.4, and I can only see 5 of the characters.

When I view the page in Internet Exporer, same result.

Alan O. says:
July 26, 08h


To see more of the characters on Windows, you need to add Supplemental language support.

In the Control Panel: Regional and Language Options, on the Languges Tab, select Install files for East Asian languages.

July 26, 08h

“As a literal character, copied and pasted into your markup from the character viewer. Make sure your site uses a Unicode encoding like UTF-8. (Although this is true for all three options in this list.)”

When you use numeric character references (NCR) or character entity references (CER), you do not need an Unicode encoding. US-ASCII or any other character set/encoding works (as far as you use NCRs or CERs for characters not in the character set). But of course, UTF-8 is a good choice in most cases. For example old Netscape 4 does support some more NCRs if you deliver the pages as UTF-8 (instead of ISO-8859-1 or US-ASCII, for example).

“As an escaped character entity in your markup, using the &#xNNNN; format (for example, ›) where NNNN is the Unicode number (can be less than 4 digits, e.g. ·) (Unicode Character Lookup)”

I do not recommend the hexadecimal notation for NCRs (That means: If you use NCRs. Direct input is much better). Use the decimal notation (and a scientific calculator to convert number systems). The reason is simple: Netscape 4 and earlier browsers. There are still some users on this dinosaurs. And they understand some of the decimal NCRs, but none of the hexadecimal ones. So it is nearly no work for us writing Web pages to use decimal NCR instead of hexadecimal ones, and some users might profit.

July 26, 09h

The glyphs shown on the image-based page are beautiful. What font is that?

July 26, 09h

I’m using Firefox Windows and I get ‘em all. But I always go nuts on the fonts when I install Office!

On a related note, and the reason I’m posting, there’s an often interesting blog (if you’re into that sort of thing) by one of Microsoft’s Unicode developers. He has an ongoing “Every character has a story” set of posts.

July 26, 11h

I’ve been a fan of unicode ever since I read Tim Bray’s excellent explanations - Absolutely worth a read, but, my main reason for posting is to point out that there are the excellent bitstreem vera fonts - - which are free and open source. Being open source there is also a more exhaustive font based on vera called DejaVu fonts see their page -

Hope I haven’t over done the linking, but, the fonts are good and free, the more users who install them the more freedom we will have when it comes to typography.

Also worth considering is as more users have extensive fonts maybe we will be able to start using the ff fl fi ligatures and em dashes en dashes. It is also nice to use the actual caracter for ‘ ’ then having to encode them. I know we’re not quite there yet but it is so much neater, and more readable when you’re doing the coding.

You know this unicode stuff might just catch on one day


July 26, 12h

Firefox 1.06 on XP SP1 with basic US install here, and I’m getting all but about three of the glyphs. And I’m very impressed with that. I’m assuming Microsoft’s fonts deserve the credit here.

July 26, 12h

sIFR is, of course, the solution here.

Joe Clark says:
July 26, 12h

There are several browser test pages, including:

And I have some links:

Your advice on how to use Unicode in files is subtly inaccurate but has been mostly corrected in the comments.

Jono says:
July 28, 05h

I am not sure if this has anything to do with the use of Glyphs (probably not) but I thought I’d mention it. I just installed FF 1.0.6 under Mac OS 10.4.2, and when I mouse over any of the links in the main content, the Glyph characters, and the white box area below the Glyphs shifts to the left. If I then mouse over the Glyphs themselves, everything shifts back to the right as it was before.

Also, hovering over one set of Glyph examples, and then hovering on another set of Glyph examples causes the same shift scenario.

Could be a FF 1.0.6 Mac bug? I have seen this before on pages I have worked on, but I cannot remember what the fix was, or what was causing the problem.

Safari 2.0 handles it perfectly, with no shifting.

IE/Mac 5.2.3 does not shift, but on mosue over it does produce one large blue block on top of the entire set of glyph examples making a large blue horizontal bar. I can send screenshots if you’re interested?

dusoft says:
July 28, 11h

My Firefox doesn’t have any problems except two characters on the right in two top rows. So, it’s not Safari only displaying it correctly. (and is does not depend on the browser only!)

Declan says:
July 31, 03h

Did anyone follow Dave’s link to the Unicode site?

Using this procedure I enabled all but 3 of the characters in FF 1.0.6 on Windows XP Sp2, which is the same result as Kelson (another FF user) at the top

August 05, 02h

It is a bit difficult to get one’s mind (mine anyway) around the Unicode universe. If you use Mac OS X, there’s a free utility that has been a great help to me : Unicode Checker (

Among other things, for every glyph, it shows what installed font, if any, can display it. Some of the characters in the post exist only in a few fonts (some asian) and one or two only in the “Apple Symbols” font. Apparently, Firefox doesn’t fall back on this font if it doesn’t find the glyph elsewhere.

squareman says:
February 06, 15h

Firefox (for me and OS X 10.4.4) doesn’t display Cyrillic characters on an UTF encoded page even if the font is set to Lucida Grande (an Apple installed unicode font with Cyrillic characters). Oddly enough, if the encoded characters are encased in a PRE tag, they display properly.

Further experimentation revealed that if I used a simple stylesheet rule to declare “* {font-family: serif;}” or “* {font-family: monospace;}” the Cyrillic would display, but not if I defined the font-family as “sans-serif” or any specific font (regardless of whether the font supported Unicode or not).

Neither will Firefox show the Cyrillic characters even if they are encoded as HTML entities as &#nn; or &#xnnnn; so I’m not sure what gives here. Is this just another failing of making a one-size-fits-all browser (XUL-based and not taking advantage of System-level tie-ins)? Is it that Firefox will only behave with UTF if the server is delivering the document as xml/application instead of text/html (I can’t test because I don’t have access to that on the server)?

Safari seems to play nice with any of the techniques that I mention above.