July 25, 2005 11AM PST

Oh, the fun to be had with Unicode.

Unicode, for those still unfamiliar, is a universal character encoding standard, jointly developed by a consortium with dozens of corporate and individual members around the world. The Unicode character set currently tops out at over 70,000 characters, and contains character sets from around the world, in both modern and ancient forms.

A bit more background follows, and then some usage analysis. Interspersed throughout this article are various Unicode characters; it’s highly unlikely that your browser/OS knows what to do with all of them. So unless you're running Safari, click on any set to view an image-based equivalent (which is a screenshot of how they actually do render in Safari).

Glyph Examples #1

Pictographs and Icons: ☣ ✈ ☎ ☠ and ♼

Although Unicode is meant for characters, the term 'character' appears to be used rather loosely. The scope of Unicode is quite broad:

Unicode covers all the characters for all the writing systems of the world, modern and ancient. It also includes technical symbols, punctuations, and many other characters used in writing text. The Unicode Standard is intended to support the needs of all types of users, whether in business or academia, using mainstream or minority scripts.

Glyph Examples #2

Intricate non-Latin characters: Ѭ ش だ ༃ and ௵

If you've never opened a character selection utility, there are a whole bunch of goodies waiting to be discovered. Both Mac OS X and Windows have useful utilities that allow you to get at the extended characters you won't find on the average keyboard. Windows has the Character Map (All Programs ➝ Accessories ➝ System Tools ➝ Character Map) while OS X has the more functional and comprehensive Character Palette (System Preferences ➝ International ➝ Input Menu ➝ Character Palette). Both of these will enable you to browse around the higher Unicode characters that one normally can't get at directly from the average keyboard.

Glyph Examples #3

Strange decoration: ℄ ❣ ᴞ ℟ and ℋ

Given the current state of internationalization amongst modern operating systems and browsers, we're coming close to actually being able to use some of these advanced characters on the web. The issues are similar to those of specfying typefaces for a browser: the user's system must be able to render the character in question.

The biggest advantage I can see, aside from the obvious foreign language support, is the ability to include pictographs and icons within a document, without having to send an image over the wire. A little CSS styling means they can be set to any size, without consuming any more bandwidth, so the visual opportunities are promising.

Though, the biggest caveat is that any character you choose must have a representative glyph (the character pictograph itself) in the font you're displaying it with. It's unlikely that we'll ever be able to rely on a specific glyph/font combination being present on the computers of every user. It's also pretty much a given that the average typeface simply won't have glyphs for all 70,000+ characters, considering the work that goes into producing even a basic latin character set. We're lucky, though, that most of the standard web set -- Verdana, Georgia, Arial, etc. -- have a larger number of glyphs designed than an average font (thank you, Microsoft).

Glyph Examples #4

Religious and symbolic glyphs: ☦ ☬ ☤ ☮ and ☭

Browsers are not equal in their rendering, unfortunately. This article was composed while testing in Safari, and looks absolutely great. Any other browser, including Firefox, doesn't. I can't even begin to speculate why this is, as I had assumed Unicode character assignment occured on the operating system level. Suffice it to say, all browsers are not equal when it comes to rendering advanced Unicode characters.

Although it's probably easier and more reliable to embed these in an image for now, if you pay attention to the character sets and font combinations, it's possible to render them as text within a browser. Once you have a Unicode number (which you can find in your character viewer of choice) you've got enough to start playing. There are at least three ways to include them as character data, and likely more:

  1. As a literal character, copied and pasted into your markup from the character viewer. Make sure your site uses a Unicode encoding like UTF-8. (Although this is true for all three options in this list.)
  2. As an escaped character entity in your markup, using the &#xNNNN; format (for example, ›) where NNNN is the Unicode number (can be less than 4 digits, e.g. ·) (Unicode Character Lookup)
  3. As generated content via your CSS, using the :before and :after pseudoselectors and the "\NNNN" format. (eg. li:before {content: "\203A";}
Glyph Examples #5

Vaguely meteorological glyphs: ☼ ☁ ☂ ✮ and ☃

But it can be hit and miss since so many variables are involved. Still, the opportunity to experiment is there, and you may have noticed certain Unicode characters showing up in the wild -- from the mundane © copyright symbols all over the web, to accents like John Gruber's stars on the Daring Fireball Linked List (they appear after each link).

It might be nice to have a reference sheet with a bunch of reliable cross-browser glyphs that work today, were anyone up to experimenting. Otherwise, consider all this a simple look at the possibilities of what we might get to play with in a few years, if all the major browsers play ball. I'll leave you with a couple of enhanced, gracefully-degrading generated content examples to show how you might consider using these characters in a non-harmful way.

Glyph Examples #6

About This Page:

You are currently on a supplement page to “Glyphs”, an entry made on 25 July, 2005

Comment on this page.