Mobile version (Display Regular Site)

Skip to: Navigation | Content | Sidebar | Footer


Weblog Entry

Who Cares about Semantics Anyway?

May 30, 2005

On semantic markup, conveying its usage to those who generally don’t need to care, and a reusable markup guide for your enjoyment.

This is how I like to define the term ‘semantic markup’:

Semantic Markup is the result of using (X)HTML elements for their proper, intended usage.

This is a pretty limited definition, better examples exist, and it’s by no means the only viewpoint out there. The terseness is partially the result of HTML being semantically limited to begin with. We don’t exactly have a rich vocabulary of element types capable of capturing the meaning and nuance behind every piece of text: We have code, but we don’t have caption; We have kbd, but we don’t have childlikescrawl; We have emphasis, but we don’t have publicationtitle. And so on.

Why care? It’s a good question, one I’ve also asked. If you spend time putting semantic markup into a page, there ought to be a payoff. Unfortunately, the payoff is less than visible. Since CSS is able to make any element look like any other element, you won’t see the result in a visual browser. It’s only when you load a semantically-rich page in a text-only browser, or in a screenreader (or other alternative access device) when you’ll start to understand the benefit of authoring this way. Additionally, various stabs have been made at extracting the semantics from a document and putting them to more general use. See Mark Pilgrim’s Million Dollar Markup and Tomas Jogin’s Hierarchy.

XML gets us the ability to work around this limitation, of course. By defining new, semantically rich languages, we can add extra meaning to our documents (and while we’re at it, create tools to pull that meaning back out later and actually do something useful with it). But that’s strictly tangental, considering the utility of HTML here and now. We use HTML for web pages, so the limitations are something we have to think about.

Semantic debates about which element to use in which scenarios are usually quite arbitrary and prone to subjective interpretation. Here’s a good example: the address element. The spec has this to say:

The address element may be used by authors to supply contact information for a document or a major part of a document such as a form. This element often appears at the beginning or end of a document.

Then it goes on to list an example with virtual addresses, in this case URI’s to the page author’s profiles. Presumably, email addresses would be acceptable in this space, but does that hold true for physical addresses? You may be surprised at how many different individual opinions exist about this, as evidenced by this SimpleQuiz from a year ago.

The ambiguity lies in the fact that the term ‘contact information’ is undefined in the spec, and therefore it’s left to the user’s best judgement to fill that in with whatever contact information they deem relevant. So one person’s email address is another’s street address, which is yet a third’s PO box, which is a fourth’s airport locker number.

And that’s all okay, you know. The spec is loose enough that it provides guidance, but actual usage is going to determine the most relevant and appropriate ways to use the markup in question. Individual preferences will guide deployment of the elements, and if a consensus is ever formed, so be it. Until then, vive la différence. HTML elements are basic constructs, it’s up to you to build something with them.

But even this gets tricky when you no longer have control of the semantics of a particular document. What happens when you pass your markup on to a client? Ever worked with a content management system that allows multiple authors to mix and match their own elements for whatever purposes they have in mind? How often do you find them choosing elements based on how they look, rather than what they mean? I see more than a few heads nodding.

“Just educate your clients”, you may think. I hope you understand why this doesn’t always work. As an analogy, compare and contrast these situations:

  1. You just built a site for a plumbing company, complete with a CMS. You tell them to make sure to use <ol> elements for parts lists, and <h2> elements for product names on their respective pages. 6 months from now, you check the site and find a ton of <br /> elements to separate the list items, and headers in <font size=”5”> tags. D’oh.
  2. In exchange for your work on the site, the plumbing company gives you some advice on your kitchen remodeling. They suggest copper pipes for your water lines running through the exterior walls, with a small length of flexible CPVC inside the house. You decide to go all CPVC because it’s slightly cheaper. 6 months later, your pipes burst during a particularly cold winter night. D’oh.

In both cases, a little bit of education goes a long way. In both cases, the client needs to take a little initiative of their own to ensure higher quality. In both cases, there will be people looking for shortcuts who simply won’t take the advice of the expert.

Simply, you can’t assume the client will care unless the task they need to perform is personally relevant to them. Education can only go so far, there has to be motivation. But that doesn’t mean it’s not worthwhile to try anyway; the site may be delivered to a single person or company, but after that it will be used by a far wider audience who will benefit from proper semantics. That’s why it’s important to care about semantics, as a web designer/developer, and at least try to convey some of that importance to your client.

To that end, I’ve recently started work on a rough style guide that I distribute to clients as a part of my completed deliverables. This is a single HTML document, marked up with both examples and descriptions of various elements. It also serves as a palette of sorts, graphically depicting the various elements when rendered in the site’s style sheet (assuming the CSS file has been linked.) It’s a rough start, and will evolve over time, but it’s important enough for the education process (and easy to re-use) that I’ve been relying on it since I wrote it.

Here it is, the mezzoblue Markup Guide, available in formatted and bare-bones unformatted flavours, with a permanent home in the Downloads section of this site. Feel free to use, edit, and share alike. I’d love to see this expanded and improved upon by all of you, with revisions released under the same CC license.


1
May 30, 01h

(Wohooo! First comment. I was refreshing my browser because my internet wasn’t connecting, and then, BAM you posted :-p)

I think it’s good someone is still posting interesting content, Dave.

Well, don’t you think that’s one of the strengths of (X)HTML, though? It’s interperatable to quite a large degree. I mean, come on, if we had to have a different kind of header for each site, it would be ridiculous. We would have 40 elements just for headers.

Also, I don’t want to get into another HUGE semantic headers debate, but on your example page it says H1s are reserved for page titles. Why? Do newpapers or books, which are the most common header-using..things..in print, use one header per page? No; they use it when context changes. Do you see what I’m saying: creating a new page shouldn’t be the only way to change main topic.

(Gah. My internet died again. Curse you, linksys!)

2
cam c. says:
May 30, 01h

Dave, thanks for the HTML guide… I’ve been meaning to put something like that together for a long time myself.

I know what you mean about trusting clients to do what you tell them; the worst is the sites I’ve done with Macromedia Contribute as a CMS; I usually try to lock down a lot of stuff so they can’t wreck the layout system I’ve come up with, and then end up getting a phone call saying “How come I can’t change the font?”… :)

I generally like to try to use the CMS to make something as foolproof as possible, though; when I set up a site for my friend (www.freyburg.com), I used some of Movable Type’s customization features to make it as simple as possible for him to enter stuff. A good example is his book reviews… I told him to upload the photo into the “Entry Body” field, and the actual review into the “Extended Entry” field, then put the link to Amazon in the “Excerpt” field. A bit of a hack, but it guarantees that everything gets linked properly, and makes it very simple for him to maintain.

I haven’t had to actually do any maintenance on his site in the half year or so it’s been up, other than tweak the comment CGI a bit to try to keep out the dumber spammers…

3
Hunox says:
May 30, 02h

Maybe we should create a Wiki and compile a list of documentation. It’d be similar to W3C but with more examples and maybe have tips and tricks of styling it tags, etc…

It’d be really usefull for many of us. I was thinking of doing it by myself, but It’d be much better with group effort.

4
May 30, 02h

((I’ll do the custom PHP coding for the wiki so we don’t have to have an over complicated one, if you want.))

If you do a search on google for “semantic definition lists,” you’ll find big discussions about the usage of Dls…just an interesting thing to note.

5
May 30, 02h

Nothing ground-breaking here, but the style guide is a nice deliverable that keeps you ahead of the curve (and it’s nice that doing this once will save you work on future jobs many, many times). Well thought-out post. I usually end up sitting down with the person or people that will be updating content and giving a short html course and a handwritten html cheat sheet. I think you’ve inspired me to sit down and make a proper one now.

6
May 30, 02h

Good article, but it still doesn’t give the designer much “ammunition” when arguing the point of semantics. The plumber analogy was pretty close, except that we see a problem with our cheaper choice within a few months, but the plumber may NEVER see a problem from using <font> and <b>.

I’m all for semantics, and try to steer my clients down the right path when it comes to updating their site, but unfortunately the only thing clients care about (for the most part) is what LOOKS right, not what IS right.

Can anyone think of any real-world, IMMEDIATE examples of why we should use semantically structured HTML?

7
May 30, 02h

Hunox:
I like the idea of a wiki of documentation for clients (with an open license) that we can all create and borrow freely from. I’m too busy to spearhead such a thing at the moment, but I’d be glad to help.

8
Oliver says:
May 30, 02h

It’s all about efficiency. If the code is efficient for the browser to render, then it’s good coding. However, the fact that a browser supports Semantics is debatable.

9
Dave S. says:
May 30, 02h

Richard M – “…it still doesn’t give the designer much “ammunition” when arguing the point of semantics.”

Completely agreed, this is still a problem. As I say, the benefits are largely unseen. There are very few ways to extract semantics out of a document at the moment that create enough motivation for the average web designer, let alone client, to start writing better markup.

Microformats might be a hint in that direction:

http://developers.technorati.com/wiki/MicroFormats
http://meyerweb.com/eric/thoughts/2005/05/18/getting-onto-the-calendar/

…were it not for the fact that (the ones I’ve seen anyway) just rely on collections of classed spans or attributes like rel. I generally tend to think that classes/ids and attributes don’t convey semantic information like HTML elements do, although microformats more or less imply that they do. It could be a case where the usage dictactes development though, and given enough momentum perhaps eventually classes WILL one day be considered semantic. I’m not sure how I feel about that right now.

10
ghola says:
May 30, 03h

Thank you for making this guide public.

Of course it would be great if someone came up with a one-liner capable of convincing clients of the importance of semantics, without sounding like we’re trying to sell them something.

11
May 30, 03h

Great article. I just touched on this subject myself in an article I wrote called Content-Driven Design.

http://mboffin.com/post.aspx?id=1719

The article I wrote is more geared toward getting designers to understand the idea of truly content-driven design. I think your article has a lot of merit for designers too, not just clients, as I found the biggest stumbling block in /designers/ truly understanding the power of CSS and semantic markup is simply the lack of knowledge of what is really possible with semantically marked up content.

12
Dan says:
May 30, 04h

Thanks for the download Dave! I like the snippet about abbr and acronym as well. I know enough designers that can’t agree on the differences (myself included) to know better than to try and explain it to a client, but I do applaud the effort! I have to admit that using DocBook in my day job leaves me wishing we had a more robust way to markup our content without having to resort to XML. Write once… display anywhere. Someday…

13
cam c. says:
May 30, 04h

Dylan, your article touches on some of the stuff a programmer I work with and I were talking about the other day…

We were talking about how a lot of designers see the technical aspect of a database driven site as the opposite of what they do, but in reality, if you have a chance to see both the backend and frontend of a website, the visual structure a designer creates often mirrors the logical structure of the database driving the site. CSS-styled semantic code is sort of the in-between, where the data structure and the visual design come together.

What I often find is that the semantics influence the way I design a site… sure, you can make an H2 heading look different in different div layers, but if you want all the H2s to have the same proportional weight in terms of importance, you’ll probably end up choosing a similar font treatment for all of them across the site.

Coding a site semantically as you build the visual look is a great way to keep in mind what is really important to present visually to the end user… it’s gotten to the point that when I start building a site in Illustrator, I even use a layer structure similar to the overall structure of the site.

14
May 30, 05h

Dave, was it by intention or accident that your ordered list example in the Markup Guide visually appears as a bulleted list?

15
Mike D. says:
May 30, 07h

Nice work Dave. The style guide is the item that we all know we should provide but often times, we treat it as either an afterthought or we neglect to include it at all. Your guide is concise, easy-to-follow and really quite essential for the long-term maintenance of the site.

That said, I’m still not entirely convinced of the importance of perfect semantics in 2005. In 2010, maybe. In 2005, not so much. You mention that the benefit is largely for assistive devices and I agree in that generally having a bunch of marked up headlines to scan over is a good thing, but I feel like the general ordering and cleanliness of the code and contents is more helpful to assistive devices than marking up every single abbreviation and citation and blockquote perfectly. In the end, both are important, but I just wonder how far tags actually go in creating a great user experience if a lot of other stuff isn’t specifically tailored for a non-graphical browser.

The other alleged benefit of semantic markup is that believers in the Semantic Web claim that machines will eventually be able to derive meaning from web pages in an intensely accurate way via the interpretation of tagged content. This is the more speculative benefit and one that, if it materializes, we likely won’t see until 2010 or so. Maybe then, maybe never.

16
May 30, 07h

I find myself in terrible trouble when I write essays, because I am seemingly unable to outline. It’s not that I can’t do it, simply that I find it’s easier to just transcribe exactly what is in my head and leave it to the reader to intuitively understand what I mean.

That’s where the problem I see in teching clients how to write syntactically correct and consistant code. Unless they’re an eccentric web design enthusiast, they probably are going to try to write like they’d write an essay — assuming the reader understands the gist of what each piece of information means.

You spoke about people needing both knowledge and inspiration in order to use proper markup, I think they also need to simply understand the philosophy. Your average user doesn’t understand that when you publish on the internet, your content need to be comprehensible to computers as well as readers. I think a quick exposé on that philosophy would also be important in a client markup guide.

(Unrelated, there’s a bug firing on your content guide — in Firefox 1.0.4 when you hover the cited link, it causes the markupguide div to become a tad anorexic. No idea why it happens, but it’s strange.)

17
Evan says:
May 30, 10h

This is good stuff, Dave. I like that you call out specific cases where em and strong might not necessarily apply.

As for “ammunition” for semantic markup — agreed that the case is a little weak right now. Perhaps the simplest argument you can make is that semantic markup integrates nicely with CSS. If you want to change the style of all your table captions, it’s easy to do so if you have been consistently using the caption element. Of course there’s no *technical* reason why you couldn’t use spans and classes, but then you have to reinvent your own syntax every time. When the language hands you a nice clean label, why not use it?

I’m also seeing the same ordered list glitch as Robert Hahn. I’m using Safari 1.x on Panther.

18
Kev says:
May 30, 11h

My own personal favourite definition of semantics is:

“the study or science of meaning in language.”

The key point here being the ‘meaning in language’ bit. We should use semantic language to give our *documents* meaning, as oppose to our displayed pages. The reason we should so this is to make life easier for everything that uses our code.

This really ties in with what Richard is saying above re: ammunition for us to sell the idea of semantic markup: if markup is semantic then browsers stand a much better chance of rendering it as you intended they should. As do PDA’s, WebTV etc etc. Just as importantly, there’s good indicators to suggest that search engines find working with semantic code easier which could improve your site rankings. Lastly, I would think that if you compared a page coded with semantic markup with the same page written in non-semantic markup then the semantically written pages would be faster loading.

All of which is a long way of saying that if we give our markup as much meaning as possible we stand a greater chance of having that meaning interpretted more accurately and faster.

19
Greg Dyke says:
May 30, 11h

Just a note about Definition Lists. You say that a dd can’t exist without a “parent” dt. Technically, the dt isn’t a parent, but a previous sibling.

20
Anne says:
May 30, 11h

If a DT element has no (next) sibling DD it shares the definition of its (next) sibling DT element. That way multiple terms can share the same definition.

I think you might want to correct that before incorrect usage spreads.

It’s also confusing that ordered and unordered lists are styled the same way.

21
May 31, 01h

In your “Markup Guide”, you in fact abuse markup by trying to demonstrate each element with its own markup - for example, by enclosing “var” in “var” tags. That’s not only contradictory, it’s counterproductive.

22
May 31, 01h

Some people (Anne) are moving back to SGML HTML. But I’m starting to think it is time to move to generic XML. The emphany came when I was having trouble converting a powerpoint slide to HTML. I decided on a whim to just make up my own tags and use CSS. It just worked…. in MSIE6, Mozilla Firefox, and Opera. If you don’t care about browsers released more than four years ago, then IE6-compatible markup and CSS would be safe. The only questions are MIME and some CSS compatibility issues. Is it time for just using XML now?

23
May 31, 01h

This is just great, thanks Dave; it’s gonna save me a lot of time teaching some people how to write basic (and correct) HTML. I was recently thinking of doing something like that for my own use but now there’s no need to.
You may also want to cover the fact that a Definition Term can have more than one Definition descriptions ;)
Don’t get it any more complex with tables etc, it’s just gonna look like another Tutorial and will scare people away, if needed just add the img tag. Well done again.

24
djl says:
May 31, 01h

Thanks for the guide Dave! Good stuff.

A few people pointed out that there are some incorrect technical statements (like dd’s not being children of dt’s), but I think we shouldn’t overlook the primary audience here. CMS users, clients… we can assume people with little technical experience, so if the guide goes on about sibling elements you can bet you’re going to lose your reader. As it is, it gets the point across.

On the real-world, immediate gain of using XHTML and semantic language, it’s mostly going to depend on who you’re trying to convince. Lately I’ve been pointing to file size, search engine indexing, and accessibility as the key reasons. But if your client doesn’t care about those things then what are you going to do? The truth clearly is that nothing’s going to happen to a site that uses tables and inline styles for everything. How do you convince someone not to use a shortcut when they don’t identify any of the downsides as downsides? There are always going to be the knuckle-heads (technical term) that don’t listen to reason.

25
Rimantas says:
May 31, 02h

I was going to ask whatever happened to microformats, but I see you’ve already mentioned them…

Another point is CMS. After some flirting with WYSIWYG editors in CMSes (HTMLArea and alike) I came to conlusion that
all this is no good.
For now my choice is solutions like Markdown or Textile.
When someone writes conent let him concentrate on content with some basic, intuitive, email-style markup. One doen’t have to mess up with (X)HTML, be it WYSIWYG or hand coded way.

On the other hand I am a big fan of style guides for webmasters.

26
Mo says:
May 31, 02h

“Can anyone think of any real-world, IMMEDIATE examples of why we should use semantically structured HTML?”

Search engines. Screen readers. PDAs. Internet TV. Mobile telephones.

The majority of the above don’t support CSS, and those that do barely scratch the surface in terms of depth of supported features.

“I decided on a whim to just make up my own tags and use CSS. It just worked…. in MSIE6, Mozilla Firefox, and Opera. If you don’t care about browsers released more than four years ago, then IE6-compatible markup and CSS would be safe. The only questions are MIME and some CSS compatibility issues. Is it time for just using XML now?”

I’ve been flirting with a similar idea - primarily, I’ve been playing with creating DocBook-XML “article” and “refentry” documents and just handing them off to a browser with some CSS. Common support is almost there (support for numbered section headings is still a little shaky, and the CSS required for IE “needs work” [notably, the lack of child selectors means you have to be particularly careful in places]), and it won’t be ready for commercial-site-use for a very very long time, without some content-negotiation and server-side scripts.

27
Mike D. says:
May 31, 06h

Martin: Sure, possibly 2008. I could agree with that maybe.

Faruk: Totally disagree. It has yet to be proven that good semantics have any effect at all on Google rankings. Google is, for the most part, a brute-force-reg-ex-crunching powerhouse. It looks mainly at URLs, things *between* tags, and relationships between pages and sites. Its algorithms rely a lot more on parsing bad code than you might think, mainly out of necessity. They made their name, in fact, by deriving meaning where semantics were lacking: on a web full of sloppy code. There are certainly good reasons to use proper semantics in your code but Google rank isn’t currently one of them.

Zach: I agree. It’s not very productive to toss semantics out the window. I am simply suggesting that as of right now, you should make proper semantics part of your routine because it’s the right thing to do and it may eventually be of great use… not because there are tremendous tangible benefits at this point in time. I’m just trying to be honest about it here.

28
Dave S. says:
May 31, 06h

The dt/dd description has been updated a bit after the comments in here, and ol list items no longer have bullets in place of numbers. (That they were bullets in the first place is a function of how often I use ol; for the last year my main stylesheet has styled “#mainContent li” with a bullet, and it’s never been a problem before now.)

29
Marco says:
May 31, 07h

Hey Dave.

I can definitely relate to this as this is a part of what I do as an Intranet Web Designer.

I don’t go as in depth with the descriptions of each element (due to the experience of the user), but I do give a ‘Style Guide’ to the people I’m working with to show them what all the possible elements will look like for their respective design.

Now, as for CMS, I am currently the CMS :) What this means is that I end up educating the users who are use to WYSIWYG editing and seemed to be surprised that the designs have no tables. ‘What? No tables at all?’ is usually the most common response.

However, I give them the analogy of using Word and how you set the respective styles. Paragraph text, Lists, etc.

Once I do that, the users get a better idea of where I’m coming from. It’s not perfect, but it’s the closest to a CMS we have at the moment.

ps: Where’s the table element breakdown??? ;)

30
Martin says:
May 31, 09h

Thanks Dave, you rock!

I usually define a semantic page of code as a page I’m able to comfortable peruse (not necessarily intensely read) without a browser. That being said, I agree on not really being able to define it precisely.

Mike D, I don’t fully agree with you… We don’t *really* need semantic code right now, but 2010? I’m thinking more 2007/8….

31
May 31, 09h

Richard M �” “…it still doesn’t give the designer much “ammunition” when arguing the point of semantics.”

Very easy answer for you: Google.

Google loves semantic pages (as do virtually all search engines, but virtually no client these days asks for “search engine success” but rather, “I want to appear high in Google”), and web standards / semantics will very effectively help boost your page rank and search engine results placement.

As far as you can do anything on the site itself, to help boost its rank in Google, semantics are THE way to go.

That’s one killer argument, I would say.

32
May 31, 10h

“That said, I’m still not entirely convinced of the importance of perfect semantics in 2005. In 2010, maybe. In 2005, not so much.”

Mike, I don’t think I agree with you view because it isn’t very productive. What will happen? Will everything suddenly become semantic overnight? No. We need to discuss and work on our skills until we’re all coding properly.

And anyway, correctly (and thoughfully) coded (X)HTML IS semantic.

Also, I don’t remember who touched on this, but I think we should start thinking about moving to XML-like HTML. It’s so much more clean then the old SGML-like HTML.

33
Kev says:
June 01, 01h

Mike D: I disagree. I’m not sure if you work for Google but if you don’t you seem to be assuming an awful lot about how the algo processes sites.

Its well establiashed for example that G uses a fairly advanced form of keyphrase stemming. Thats not a light technology. Any algo that can introduce stemmed keywords can certainly extract meaning from semantic markup. You say its yet to be proven and I agree. However its a redundant point - *everything* is yet to be proven when optimising for search engines. The only things we can definitely rely on are whats written on G (or Y or MSN) own help pages. Does the weight of evidnce favour your view? I don’t know. All I do know is that, for me, semantic pages seem to carry more weight in SERPS. If you find otherwise then that fine but I’d move away from such a definitive position if I were you.

34
Apollo says:
June 01, 03h

Well, Dave, maybe it’s an oppurtunity, regarding the testing of the effects of semantics on SEO; someone(I’m not important enough:-) ), when they redesign a big-ass site, record all of that site’s rankings and compare it to before. I’d think someone would have done this before, though. Also, it would hard to distinguish between more users because of the better design(word-of-mouth, then search), from more users from just SEO.

I just figured it out, Dave! When you post two dashes(- -) it messes up your comment preview, because you commented out something which contains the comment. So, in effect the two dashes(- - end the comment.

35
June 01, 05h

Mike D: sorry, what? Sure, incoming links aid much more in your google ranking, but Google has definitely been favoring clean, semantic sites over nested tables-ridden sites, or frameset sites, and so forth. At least for about a year, now. I’ve seen sites without any incoming links show up on Google in the first 10 results, having semantic markup and getting a pagerank of 5, versus sites with plenty of incoming links, but being frameset sites with horrible markup, pagerank 0, and hardly appearing in Google.

Do good URI’s work great in Google? Yes, definitely.
Do incoming links help boost your rank in Google’s results? Yes, definitely.
Does semantic markup help you? Yes, not as much as the previous two, but it _does_ help.

I’ve found of late that Google is doing a lot with headings, which is exactly what it should do. This is very simple semantic markup making a difference.

36
elv says:
June 01, 06h

Rimantas : It’s true that wysiwyg editors like HTMLarea allow people to do anything, and this is no good. Nobody wants people to write a page in pruple Comic Sans and ruin your larefully crafted layout by adding coluns.

Another wysiwyg editor has a better approach : Tiny MCE. With this one you can define what the user can do. You can activate or deactivate every button and list. You could choose to simply offer a few buttons with strong and em and a list of predefined styles.
http://tinymce.moxiecode.com/

37
Mike D. says:
June 01, 07h

Sorry Faruk but I call BS on that that one. There is no way a site with zero incoming links will receive a high page rank because it properly uses heading tags. I think you’re confusing two issues here (and possibly three):

1. First of all, you mentioned framesets a couple of times. Throw that out because obviously that has a negative effect on search engine ranking and has for 10 years. The reason for this is obvious: splitting your documents up abstracts important information to disparate locations. And by the way, nothing about using framesets is “non-standard” either. It’s part of the standard… it just generally isn’t a good idea unless necessary.

2. You mentioned sites with plenty of incoming links and PageRanks of 0. Please show me these sites. The only way I know of that that could happen would be if the site were either completely unreadable or specifically penalized for trying to spam Google. If a site has 5000 incoming links and 50 tables and font tags, it’s not going to be a zero, sorry.

3. I think the main confusion here is that “writing semantic markup” and “writing SEO-optimized markup” are not the same thing. The former is an obvious skill which involves following W3C specs and doing all the things that our host Dave so generously spells out on Mezzoblue every week. The latter is not so cut and dry. It involves things like:

– How far up the page certain words appear.

– The ratio of a certain word to total words in a page.

– The proximity of certain words to each other.

– Use of the title tag and certain meta tags.

Both the former and the latter can be considered “writing good code” and the former can certainly lead you in the direction of the latter (which is a good thing) but to say that proper semantics in themselves have a dramatic effect on PageRank is just not true at this time. Ask people like Keith Robinson, who have experienced first hand that taking a site from crappy code to clean, semantic code can sometimes even *lower* PageRank. It’s not lowered because of good semantics, but rather because the makeup of the page has changed in an SEO-negative way… and surprise surprise, the increased “quality” of the code couldn’t make up for it. I really wish I could find the page (but I can’t and I’ve searched for a few minutes now) but someone did a bunch of Google tests relating to this exact subject several months ago and found that semantics themselves really had very little effect on PageRank… page composition did though. I’ll keep hunting around and post the URL when I find it.

So anyway, I guess what I’m saying is that one shouldn’t go around proclaiming that proper semantics have much of a noticeable effect on PageRank because they don’t appear to. No one has proved they do, and people have so far only proved the opposite… that they don’t. Remember, Google, at its core, does not exist to reward people who write good code. It exists to reward people who write good content.

38
Dave S. says:
June 01, 08h

re: this semantics vs. Google issue, allow me to share an anecdote.

When I was delivering a keynote last year, I referenced the study Mike is talking about in comment #36 - http://peterjanes.ca/blog/archives/2004/08/16/silly-result-set-1 - and said that while we once thought semantics were important for Google, evidence suggests otherwise, and that proper markup doesn’t necessarily give a site the boost we’d hope for.

I had an SEO approach me afterwards and give me a tip. He said that, all other things being equal, proper semantics *do* improve a page’s ranking. By “all other things being equal”, that meant incoming links etc. The flaw in this study, he theorized, was that it didn’t have much PageRank to begin with, so the results would be skewed. Were it an existing, well-linked page, the results would be much different.

I’m repeating this as simple hearsay though; I haven’t seen results to back it up. Anyone reading this deal with SEOs? Can we get some supporting links?

39
Mike D. says:
June 01, 09h

Dave: Excellent. Yep, that’s the article I was referring to. Nothing that the SEO said to you necessarily conflicts with what I’m saying to Faruk here: It can indeed “matter” just as having a 1.6 Ghz machine over a 1.5 Ghz machine can matter (although as I said, it hasn’t been proven). Point being that there are probably at least 20 different factors which can cause the 1.5 Ghz machine to outperform the 1.6 Ghz machine (RAM, video card, processor type, etc).

The problem is that in the world, “all other things” are never equal, so all I’m saying here is that if semantics matter at all to SEO, their effect is hardly noticeable at this point. Those who make claims to the effect of proper semantics completely reinventing a page’s SEO-effectiveness are being disingenuous in my opinion.

40
Dave S. says:
June 01, 09h

“if semantics matter at all to SEO, their effect is hardly noticeable at this point”

That’s where I’m at too. Proper semantics are nice, and will likely help SEO to some degree, but they’re not a huge Google-boost on their own. There are a lot more factors in play that probably matter more. The 1.5/1.6Ghz analogy is great.

I’m certainly willing to revise that opinion in the face of solid testing and analysis, but I’ve yet to see any beyond Peter Janes’ initial piece.

41
June 02, 01h

Mike: seems you’re definitely right on one thing. I can only find frameset sites with incoming links and a PR of 0 or 1. No non-framebased sites I’ve found (in a quick 10-minute scan) have such low PR when they have incoming links, but that doesn’t surprise me, as I never said semantic markup was superior to incoming links for Google (in fact, I did acknowledge the opposite).

However, I do have one example of a site with “no” incoming links, and a PR of 4: xhtml.nl
This site has no incoming links on www.xhtml.nl, and only incoming links of ITSELF on just xhtml.nl. Yet, this site has a Pagerank of 4.

I stand corrected in that semantics don’t aid as “much” as I thought they did, but the XHTML.nl example does at least show that there are cases where it’s definitely not incoming links boosting a site’s Pagerank. Whether it’s the semantic markup of XHTML.nl or something else, I can’t quite tell, but it’s certainly not incoming links.

Additionally, Google’s been changing their system over the past week, I hear. Who knows what might change concerning this, now and/or in the future.

42
June 02, 02h

Dave S:

I think there’s only one truly proper way of doing tests for this, and that’s registering 2 new domains, putting the exact same content on each, but one in semantic markup and the other in horrible markup. Same URI structure, no publications of the URL’s anywhere, only submit the site to Google for crawling and then, after a week, check the results. Do they both show up? Does one site show up higher than the other?

Maybe I’ll just do this experiment after the @media conference. Definitely don’t have time for it before then…

43
June 02, 04h

Mike D: I’m using someone else’s IRC-plugin, I should ask how it works exactly, as I do not know myself. I do know however that its results are reliable as they are the exact same as the toolbar gives.

I checked one of the first actual pages from said framesite and it has a PR of 2, instead of the 1 that the site as a whole, has.

44
Dave says:
June 02, 05h

Would it be possible to us to create a page with all known information? A wiki would be very suitable for this. Then atleast we can easily view a single place.

45
Mike D. says:
June 02, 07h

Faruk: I’m conducting some tests as we speak… just for kicks. They are similar to the tests Peter Janes conducted, but with a slightly different methodology. I have a pretty high PageRank to begin with so hopefully what the SEO guy mentioned to Dave at that conference won’t be of issue here. I’ve actually wanted to do this for quite some time but just never got around to doing anything too scientific.

As for the “xhtml.nl” example you gave, I reckon that site’s PageRank might come from the fact that there are 1450 references to the term “xhtml.nl” on the web, and that is the exact domain of the site you’re talking about. Domain character matches do a lot for PageRank.

46
June 02, 09h

Mike D:

Do keep us (or me, at least) up to date on the details of your tests. Dave’s (not DaveS) idea of a wiki about such things might have some merit, though I think a well-written article that takes all of our tests into account fully is more useful..

As for XHTML.nl - I see. Well, it seems then that Google severely punished frame-based sites, because the main frame-based site I was using for comparisons has 19,000 mentions in google on just the domain without TLD, and 17,500 on the domainname _with_ TLD. Still that page has a PR of 1… even with over 800 incoming links.

47
Mike D. says:
June 02, 11h

Faruk: What tool are you using to check PageRank, out of curiosity? I don’t think that Google actually physically punishes framed sites (unless they do other things to get themselves punished), but generally an *actual* frameset page will receive a low PageRank because there’s simply nothing to index. There’s not even a body tag in a frameset page… no content either, unless the site uses the <code>noframes</code> tag. Have you tried checking the PageRank of any individual frames within the site? That is, documents with actual content in them?

48
Lea says:
June 03, 09h

Thanks for the download, Dave! This will prove useful not only for illustrating semantics, but for barebones styling purposes, like say… style guidelines at the beginning of a project. :-)

49
June 08, 12h

Anyone got any pointers on how to handle dates on a webpage? The common way is wrapping them in the semantic-pointless <span> (or in some cases a div). headers are out of the question, so are all paragraph entities.. So, what to do?

50
Peter J. says:
June 21, 06h

Regarding comment 37 - http://www.mezzoblue.com/archives/2005/05/30/who_cares_ab/comments/#c011764 - I never really followed up on the “study” (which is a generous term: as I wrote at the time, it was “a fun experiment” that was “not exhaustive and hardly scientific”). The current results suggest there might be something to semantics and Google after all though: while the first link is still to plain old text, the rest are in what I’d deem the “correct” order, with no detectable difference in PageRank.

51
February 06, 05h

From my experience semantics are very low on the priority list. I just believe search engines have way too much processing to do to worry about such fine things as element relationships. They’re purely content hoovers. Referring to Google, the single biggest impact in terms of rank is the quality, phrase and destination of a hyperlink.

You could argue simplistic semantics are a factor such as structure but I’ve not noticed any discernable rank benefits for improving structure on a site. Web standards certainly improve the content to markup ratio which may have more impact but an awful lot more research would need to go into it before giving semantics a top 10 tip for search engine traffic…