Mobile version (Display Regular Site)

Skip to: Navigation | Content | Sidebar | Footer


Weblog Entry

Validation, Moderation, Constipation

June 17, 2004

Validation matters. No it doesn’t. Validation is hard. No it isn’t. Standards are flexible. No they’re not. Does this conversation sound familiar? Updated 18 Jun 2004

While variations of the debate over the ease and importance of following the standards to the letter have been flying around for years, the coming summer months appear to be heating up the arguments once more.

There’s a cross-site conversation about validation happening at the moment, and you may have seen it in places I haven’t. On this site anyway, a seemingly innocuous opinion piece from last week calling for moderation and sensibility saw the discussion fly completely off the rails and devolve into a sordid display of finger pointing and blame. A misunderstanding is occuring between two loosely-defined groups that is resulting in unnecessary hard feelings.

(I’m going to generalize here and label everyone as a ‘designer’ or a ‘coder’ to illustrate my point; recognize these statements are general trends; exceptions exist, and there are people who are both or neither.)

When standards-conscious designers validate their XHTML and CSS templates, everything is nicely compliant up until the point where they start tying in the necessary automated systems like ad software, CMSes, or e-commerce apps. The tools then get in the way and code is needed to fix validation, but a lot of designers don’t code. Because the issue is technically out of their hands, they look for ways to justify the validation failure and avoid responsibility.

Coders that care about standards have fixes for the problems they experience with software. It’s relatively easy for a coder to automatically escape ampersands in generated code; at the very least, they know how to Google the fixes. At best, they can scrub all automated output, fix errors, and tune their HTML to a degree most designers would only dream of. And many of these coders look at the designer’s failure to duplicate their success as laziness or worse, without realizing that a designer’s skill set simply does not extend to the world of fixing code output.

This divide is illustrated nicely in Mike Davidson’s article which I linked yesterday. Mike is looking for reasons not to validate because he deals with these types of systems, and fixing them is a Herculean task he doesn’t have the resources for. The fun begins when Steve Champeon pokes his head in (comment 15) and attacks Mike’s arguments on firm logical grounds. But Steve throws in a telltale sign that he has unrealistic expectations of Mike, by referencing “the slacker ethic” and stopping just this side of calling Mike lazy.

There are two truths here. One, people of all different backgrounds and talent levels are creating content for the web. Two, there’s still a remarkable assumption that one who does a certain job for the web does all jobs for the web, which even comes from colleagues who should know better. That the assumption exists is true; the assumption itself is suspect. Everyone who survived the past 5 years of economic downturn did so by diversifying, but specialties still exist simply due to the massive variety of possible skills that no human could master in one lifetime. I don’t expect a coder to know the first thing about monitor colour profiles or typography; it’s great if they do, but I don’t expect it. Likewise, a coder should not assume a designer knows how to deal with data, or fix validation errors using code.


Okay, whew, we’re through the build up. That wasn’t even the part I wanted to write. This is:

Can we all just shut up and start listening to each other please?

Designers: the coders are trying to tell you that there are ways to encode those ampersands and validate things like comments, do it with minimal time investment, and make it happen automatically.

Coders: the designers are trying to tell you they don’t have the skills to implement these methods and they need guidance here.

During the raging debate on this site last week, two contenders — Jacques Distler and D. Keith Robinson — both contributed their share of the blows, but after the dust settled they ended up working together to solve Keith’s problems. A coder willing to share his or her experience is worth 20 coders who will point out a problem without suggesting a fix. A designer willing to accept help instead of sweep the problem under the rug and justify it with spurious logic is golden.

What we need is to start working together like this on a larger scale. You’ve gone to lengths to programmatically fix improperly nested tags? Great, write it up. You have a killer PHP function for parsing out raw ampersands that can be copied and pasted into a site-wide header? Perfect, share it. You can make a bad tool better? Do it! We don’t have to keep re-inventing the wheel for every new site, we can build common code bases that make validation painless and share them.

A knowledge base of tricks and tips for dealing with bad HTML would go a long way if it’s easily understandable by the non-technical, or even better, copy and pastable. Arguing is a good way to waste time, but if you really want to change the way someone works, show someone how it can be done. Don’t leave it up to them.


Now this is more like it. David Osolkowski shares his PHP. Anyone else? Let me know, I’ll start building a list:


Reader Comments

1
jim says:
June 17, 01h

hmmm….

i work for a company that has a multi-discipline team of coders & designers, (hopefully) when a designer cant hack it - they seek out a coder, when a coder doesn’t know what colour or heading tag to use, they seek out a designer…

..now there is some crossover - some coders can do a certain amount of design, and some designers can even do some code, heck - officially i’m a designer - but the coders call me a geek!…

…I think what I’m trying to say is that their is no black/white boundry and that to get on in this wonderfully fuzzy world of ours we have to be ‘grey’…

..ie, lets cut each other some slack here - coders, designers, whatever - we are all trying to achieve the same ends, lets ‘justify’ the means… :)

note to self: do not post into blogs after visiting tha man.

peace.

2
June 17, 01h

I think you’ve simplified it too much here Dave. The problem is not that a lot of us designers don’t know how to encode the ampersands. We’re not looking for guidance (and I don’t think Mike is looking for a reason not to validate). The problem is that we are not in a position to be able to encode them. The companies we work for or those we currently purchase solutions from don’t want to commit the resources to fix the problem—whether it’s man hours adjusting the product that generates the code or server processor time that corrects faulty mark-up right before it is sent to the browser.

3
Mike says:
June 17, 01h

…the designers are trying to tell you they don’t have the skills to implement these methods…

Really? Here’s the thing about that, it’s always seemed to me that designers have been trying to tell me to go to hell. Interestingly, most designers are designers, in that design is their identity. Pray tell, why should a designer care about such trivialities as HTML entities? I think all those complicated numbers and symbols reminds them of algebra which is all very scary and, frankly, that’s a job for a robot, not a living breathing human being! Can’t you make the computer do all that?

I could spend some time satirizing the engineer mentality, but I choose not too. I’m terribly busy.

Anyway, if you really want both the engineer and the artist to see each other’s point of view, you have to get over the longstanding stereotypical that the engineer is heartless and inhuman and the artist is useless and impractical. Speaking as one who wholeheartedly embraces both occupations, I can tell you how frustrating it is to encounter engineers/artists who lift their noses high in nearly palpable disgust when they encounter anything creative/technical. I view those people in perhaps the same way a prehistoric proto-mammal might view a dinosaur - a relic of cruder, less enlightened times. The mammals might slap the dinosaurs across the mandible, or scream in their faces, “Evolve, dammit!” but they don’t listen. They never listen.

4
Mike D. says:
June 17, 01h

Here here!

As I’ve mentioned before, one of the things I most admire about Dave is his ability to unite. While I may go off and write a rant filled with hyperbole in order to get people thinking, Dave’s blogging style is much more diplomatic and I love that about him.

Just a side note though: My lack of desire to use a strict doctype has more to do with the fact that our production environment requires leniency than the fact that I am unaware of the fixes for certain problems. Just a quick true-to-life example which happened last week: let’s say our sales team puts together a last minute deal to display an Orbitz module on the front page of ESPN for a lot of money. The module must go live by the time a certain sports event happens and the Orbitz team is writing the code. Using a loose doctype, we can take their code as is, execute on our sales agreement, and everybody’s happy. Using a strict doctype, we’re pretty much going to be telling our sales team “Sorry we can’t collect this money. Our site isn’t tolerant enough.”

If partners who spend a lot of money with us view us as difficult to deal with, then that’s never a good thing.

5
Phil says:
June 17, 01h

Apparantely, the comments dislike PHP code. Let me try again:

OK, I think I got it. I left out the opening and ending brackets on the PHP tags, so it’s ?php and ?. There should be a left angle bracket before ?php and a right angle bracket after ?. For some reason, if I include them, the code disapears. Sorry for double posting :)

I guess I’ll share a few tips then, as I’m a much better PHP coder then I am site designer.

1) Gzipping. Gzipping makes your pages smaller. This benefits you in the manner of saved bandwidth, and the user because it’s quicker download. It’s also extremely easy to enable. At the very top of your page, above ALL html anbd output (doctype included), put this:

?php
ob_start(‘ob_gzhandler’);
?

and at the very bottom, put this:
?php
ob_end_flush();
?

2) Using the output buffer, it is extremely easy to enable sitewide & to &. It is based of the following function:
mixed str_replace ( mixed search, mixed replace, mixed subject [, int &count])

Basically, do this:
$string = str_replace(‘&’, ‘&’, $string);

To do it sitewide, with the gzipping, at the top of your page do this:
?php
ob_start(‘ob_gzhandler’);
?

and at the VERY bottom, do this:

?php
$contents = ob_get_contents();
ob_end_clean();
echo str_replace(‘&’, ‘&’, $contents);
?

3) Another useful trick is to take care of ALL html characters (think, comments). All characters which have HTML character entity equivalents are translated into these entities.
?php
echo htmlentities($comment_string);
?
I don’t suggest doing this on the whole site, like with the & trick, because all brackets will become the html entities (< becomes & l t;)

4) The new line to br function. This one is really well known, but if you haven’t heard of it, it switches new lines (as in a text area) to tags.
?php
echo nl2br($str);
?

Hopefully someone found this helpful.

6
June 17, 01h

I thought replacing ampersands would be that easy too Phil until I looked into it. Your approach will also replace ampersands that are already part of other properly encoded entities.

There’s a useful regular expression (borrowed from the Amputator MT plug-in) in my latest post that will encode only those that aren’t already part of an existing entity: http://www.shauninman.com/mentary/past/amputate_the_w3c_validator.php

7
Phil says:
June 17, 01h

OK, I’m really sorry for spamming - last time, I promise :)

in the str_replace examples
str_replace(‘&’, ‘&’, $contents);

The second ampersand should be the xhtml version, &amp;

8
Phil says:
June 17, 01h

Forgot all about that Shaun :) Sticking that preg where I have my str_replace will make everything go smoothtly.

Sorry about all my comments guys. I’ve been unlucky today.

9
June 17, 01h

Great write-up, Dave! This is indeed the spirit we need more of. Validation matters, but should be a no-brainer for the designer or content editor. We need better tools! When using things like MT or WP we shouldn’t need to worry about unencoded ampersands or comment validation. I can understand why people (‘designers’) stop to care, but I feel it’s important. (That’s why I joined the X-Philes…)

Maybe we should set up a site that gathers resources, tips and tricks, discusses software used for validation and so on. Something in the spirit of the CSS Zen Garden…

10
June 17, 01h

It’d be great if there was a Sourceforge project that acted as a library of ready-to-use web code snippets. (Maybe there already is?) One would be able to search for a particular problem (e.g. “unencoded ampersands”) along with the language(s) you want to use (e.g. CSS1, CSS2, CSS2.1, CSS3, IE5/Win-compatible CSS, IE5.5/Win-compatible CSS, IE6/Win-compatible CSS, IE5/Mac-compatible CSS, Moz1.6-compatible CSS, Moz1.7-compatible CSS etc.)

11
Dave S. says:
June 17, 02h

“The problem is that we are not in a position to be able to encode them.”

This is often the case too, no doubt there. But if you polled a sampling of a hundred designers, I think you’d find that only a handful have mastered a programming language. I’d be one of the many who hasn’t, for example (as evidenced by the problem Phil had with character stripping when posting his code snippets in this very thread)

12
June 17, 02h

Dave wrote: “What we need is to start working together like this on a larger scale. You’ve gone to lengths to programmatically fix improperly nested tags? Great, write it up.”

Right, that is exactly what I’m planning to do. I need a ‘post’ form on my site, both for my own blog posts and user comments (currently I’m still using direct myPhpAdmin database access -_-;;), so I will have to write such a thing sometime soon, and I do intent to share. I’m thinking about a partial automatic fix (uppercase tags, escaping characters, etc) and a partial manual fix (you won’t get past the preview window if a tag hasn’t been closed properly). Of course paragraph tags will be inserted automatically, and the other tags and their attributes are cross-checked with a whitelist of allowed tags.

Same goes for the Javascript I’m using which works on XML (in an application/xhtml+xml strict environment). There’s just still not enough information available about such things. Many stuff uses document.write for example, which won’t work there.


~Grauw

13
June 17, 02h

I liked the part about having a library.

What really amazes me is that this community of designers and coders is HUGE, but there is not one single standards forum out there that I have found.

When designers want that help, or coders want that opinion, where do they go? Its great for people that work in a corporate environment and have resources all around them, but what about the free-lancers and free timers.

I always thought it funny that the best available CSS resource I have found is some mailing list, that maybe I just don’t understand how to use, but seemed very archaic.

A archived active help forum would seem like the thing this community could really benefit from. A place for the community to primarily go to help others and secondarily, to receive help themselves.

14
June 17, 02h

A wiki would be nice…

There is already one for CSS: http://css-discuss.incutio.com/

OTOH, a site with articles, resources and sources such as my MSX Assembly Page site is also cool. It is surely easier to create (how many XHTML wiki’s are there, I wonder? :)), and it offers a form of control a wiki doesn’t have. One thing my page is missing is the ability to post comments on articles, similar to the PHP manual. Those are very useful, people could contribute through the comments, and the site owner could add the content of the valueable comments to the site itself.


~Grauw

15
Matt says:
June 17, 02h

One of these days I’ll sit down and write a blogging system that is XHTML compliant and automatically fixes silly things like <br> and unencoded ampersands. Hmm…

16
beto says:
June 17, 03h

What I have often found among the tens of designers that have been through our company is that very few of them have the neccesary interest to learn how to do things right - information is available aplenty and it is easy to find on the web for anyone interested enough on the subject- that’s the way I learned what I know. Instead, what most do is sticking to their way of doing things “as they have always done” and refuse to learn more and better ways. So yes, from what we’ve seen so far, there is a fair degree of slacking among the design community in regards to learnship of web standards and coding languages. HTML and CSS are not C# or Python - again, anyone determined enough can find the resources and learn from the torrents of information out there. But those interested are few and far between.

17
Steven says:
June 17, 04h

Matt:
I will be waiting for the day that you do such a thing. :)


I think a few people should sit down and create an PHP or Python application which converts all this junk to standards-compliant XHTML. Anyone interested?

18
June 17, 05h

I’ve always thought that designers have 3 things to look at(in order of importance)
1. Browser compliance
2. Website design and functionality
3. Standards compliance

It is then the coder’s secondary job(after coding) to make sure that (s)he does not screw up the design or validity of the page or allow endusers to screw it up.

But the designer’s job is to design websites that work, and if they don’t have the time or ability to do so then we can’t really fault them. However we must always stress that standards are important that designers should do the best they can.

Steven as someone who designs, codes and (attempts to) create content I think it’s imperitive that we make it as easy as possible for people to create valid XHTML and CSS and I would love to help with a crap to valid translator(I’m a php man myself).

19
J. King says:
June 17, 08h

Pat:
Those are pretty skewed priorities, if you ask me. How do you define “browser compliance”? Whatever works in IE6? Whatever works in the most browsers possible? Whatever works in Mosaic? How is a designer supposed to know? Why should it be a priority at all? Let the designer design.

Said design -should- have a basic understanding of how hypertext works, but agent compatibility should probably be the furthest thing from one’s mind. That comes later.

Myself, I couldn’t design a phone booth, but I’ve realised designs made by both people who understood the Web and those who did not. Making them valid and even meaningful to an extent was not hard.

In any case, #1 stems from #3, at least when it comes to markup (styling is something else altogether), so there’s no way that order is at all right to me. Mind you, I’ve never worked in a professional environment.

20
Anonymous Coward says:
June 17, 09h

Semi-Anonymous Coward here.

I just wanted to see what people thought about WordPress. It’s supposedly a standards-compliant CMS written in PHP. I’ve used it before, but found it rather bland and a bit short of being 100% compliant (but then again, does such a system even possible)? It’s not a feature rich as MovableType, but it’s still a solid system overall.

21
June 17, 09h

This reminds me a lot of when I was compiling release packages for a defence contracter. We would try and get the developers to run lint on their C programs before they submitted them to the Configuration Management system. There was always a lot of opposition and a lot of the arguments that have been trotted out here are virtual paraphrases of the arguments in favour of not using lint then. Usually the developers won with the trumping arguments “the project can’t afford the time” or “the deadline will slip”. Of course, when the next release of the C compiler changed the interpretation of some obscure constructs, a lot of unnessecary time was spent trying to loacate the real warning among all the noise of the should have been fixed warnings.

Like it or not (X)HTML/CSS is a (family of) programing language(s). Specifically they are scripting languages interpreted by a browser to create visual images. Validation of code before submitting it to the interpreter should be a standard part of quality control for a commercial site.

Having said that I think it is rank pedantry to flay a site on the basis of the occasional unencoded ampersand unless the target implementation requires absolute compliance. However it I was hiring someone to develop web pages for me I would have no qualms at all about validating their homepages. I probable would not fail them on that basis but I would tend to discount any claims thy may have made about standards compliance in proportion to the number and severity of the errors.

The kind of program alluded to by Matt and Steve is no big deal in Perl. I wrote one to parse MS Word HTML output, clean up obvious stupidities, convert absolute font sizes, and output compliant HTML 4.0 in less than a week. It only took that long because I couldn’t download the HTML parser modules and had to write my own. I recently did an experimental hack to output XHTML 1.0 and that took half an hour but it is really ugly code.

22
June 17, 10h

Steve Gunnell Wrote: “Like it or not (X)HTML/CSS is a (family of) programing language(s). Specifically they are scripting languages interpreted by a browser to create visual images…”

XHTML and CSS are not programming or scripting languages. XHTML is a markup lanuage and CSS more of a declarative language (although there’s probably a better term to describe it than declarative). Although they are parsed and interpreted by the browser, that does not make them scripting languages.

23
Matthias says:
June 17, 11h

Lachlan, I don’t think Steven Gunnell was trying to redefine the definition of XHTML/CSS, he was writing about quality control. And it doesn’t matter whether XHTML/CSS is a scripting language, a markup language, a declaration language, or a musical notation language; what matters is the quality control that assures the product follows the technical specification.

24
cmcooper says:
June 18, 01h

If the resources are available, I think a new “kit” site needs to spring up out of this discussion. www.webstandardskit.org it would be named, and it would be more than just trading flames. It would be where methods of standards-based design would be shared freely. It would be the place to hold these discussions in a more orderly manner. And hopefully it could be useful in the education field, because there is a whole generation of designers being educated that know practically nothing about these crazy things we call “standards”. Most college kids don’t know what standards really mean for any industry. I say this because I’m a student myself and when I mention standards in the classroom they think i’m talking about popular design techniques so they tell me to think outside of the box and stop listening entirely.

I think a large problem is that solutions to these problems are not centrally located. If an individual is not aware of the issues with standards, they don’t really have a place to start looking. WaSP is not enough in this respect. The blogosphere is too loosely connected and spread out. Things must be organized and consolidated. The resources might not be there for a project of this magnitude, but it might be the only true step toward a solution.

Being a student I naturally assume that I will be overlooked in discussions such as this. However, I think it is becoming obvious that it is the same group having these discussions with few new voices joining in. To me that means that word is not really getting out very far. Blogs are not the appropriate way to solve this problem. It’s time to do more. I’m sure some of us are willing.

25
Tom says:
June 18, 01h

What we need is a designer/code forum wiki, whatever.

The basis would be, “I am a designer, I need to know how to validate my code, could you help me?”

“I am a coder, I want to learn how to design things better, could you help me?”

It could be a place to share ideas, code snippets, and design ideas. Sounds great to me.

26
Vincent Grouls says:
June 18, 03h

Tom (http://www.mezzoblue.com/mt/preview.php#c005779):
> What we need is a designer/code forum wiki, whatever.

I agree with the fact that there should be a central place for code snippets and something more pleasant to look at than e.g. the css-d mailing list. However, I have once visited a Wiki and it didn’t attract me at all. Having to go to a Wiki to get or submit code snippets doesn’t appeal to me at all, even though I am always in for help. I’ve got experience with PHP/MySQL and with (X)HTML and CSS and I would be happy to share a bit of both worlds with everybody.

jim (http://www.mezzoblue.com/mt/preview.php#c005748):
when you’re both a designer and a coder, does that make you a robot with identity? ;-)

ps, first poster - have been lurking for a few months now. I would like to take the opportunity to thank Dave and all others out there to help me transition to web standards.

27
michael says:
June 18, 04h

This is an interesting discussion with a lot of good points on both sides. I’d even go so far as to say that both sides are right. The question is what are they right about? Non-standard code perpetuates some real evils. Valid but ugly web pages don’t win clients, who really care more about the marketing potential of a web site than about some technical mumbo-jumbo. Though I have found that search engine ranking, which benefits greatly from well structured pages is a good selling point.

To a large degree we are dealing on the level of philosoply and esthetics. What is beautiful? Is it clean, well structured and commented code? Is it a drop-dead-gorgeous layout? Each, of course, to the different groups. I remember reading a Slashdot link about programmers having similar personality traits to artists. So we have here two, if you will, schools of art, one bowing to unicode deities, the other to pixel perfect ones. Both are right and the CSS Zen Garden masters aside, few people have both the talent and skill to become good at both.

How willing are we to embrace the fact that standards have sex appeal only to geeks and philosophers, and that standards would make everybody’s job easier if only the browsers actually implemented them? It’s both, and, folks.

I’m a designer. I want to use compliant code. My templates always validate. And if a client wants a particular weather widget, scrolling DHTML hack or CMS I’m not going to risk losing a contract to fight against it. If I can find a viable alternative in a short time, I’ll lobby for it but time is money folks and I get paid for designing, not crusading or spending days learning enough programming to make ampersands go away. If there is an out of the box solution that I can employ myself with my limited programming skills and it isn’t too expensive, is reusable and has good documentation, adequate to hand hold a novice programmer through installation and use, great. I’ll use it. But it had better work and be user friendly. User friendly is exactly what pages of scripting code are not.

Don’t give me grief. Give me good, easy to use tools. Don’t give me a PHP class or a PEAR library. I’m not a programmer. Don’t give me me a JavaScript snippet. Give me a complete, copy and paste script with every detailed step laid out. Pictures would help too. Lots. Better yet, make it a Dreamweaver extension. This ain’t selling out. It’s a matter of skill and time. I don’t have the skills to rebuild my transmition or repair my microware either. Yet I use both every day. I could spend the rest of my days meditating on Photoshop and never learn all of that essential design skill. It just isn’t possible to know it all.

I appreciate and applaud the standards crusade and the standards pioneers. I do what I can with my modest skills. However, right now, complete standards compliance is a lot like Linux, wonderful in concept but not quite ready for prime time for the non-programmer, which is describes most of us.

28
Dave says:
June 18, 05h

What this whole thing boils down to from my perspective is a bunch of people complaining about lazy people they know and railing against their misconceived stereotypes of these lazy people. There are a few “coders” (to use Dave’s vernacular) who have had bad experience with lazy “designers”, and a few “designers” who have had bad experiences with lazy “coders”.

My solution to the whole problem is that if its broken, fix it. If you don’t have time to fix it, don’t complain. I work mainly as a front-end guy (that would be “designer” to you all) doing Graphics and HTML, but my hands are tied to the output of the HTML because of the corporate mandate that we keep everything the same so that head “coders” and self-proclaimed “designers” in the corporate office can understand it. These guys think that renaming their table spacer gif to s.gif is the cutting edge of bandwidth efficiency.

So what do I do? I don’t complain about it and do freelance design and coding in my spare time where I can actually create symantically accurate XHTML layouts generated by nice compliant CMSs. I use an open source CMS because when I find problems I can fix them. Therefore, I don’t complain.

So, to all of you complaining about how lazy “designers” are, or about how lazy “coders” are, why not consider the fact that you are mainly whining about specific people, not the majority. Stop generalizing.

29
June 18, 05h

cmcooper: “I think a new “kit” site needs to spring up out of this discussion. www.webstandardskit.org it”

Tom: “What we need is a designer/code forum wiki, whatever.”

They’re both really good ideas. I would love to have a central repository for (X)HTML, CSS, PHP, JSP, JavaScript, etc… code snippets, and perhaps even images for backgrounds (or whatever) that would all be freely available and contributable by anyone, perhaps realeased under a creative commons licence. Having such resources stored in a central repository would be so much easier to use than searching through countless blogs/websites, mailing lists and newsgroups. However it’s implemented, I sure would be happy to be involved with such a venture, sharing and learning various techniques with others.

I’m most definately a coder. I find writing valid XHTML extremely easy, and would be glad to help others improve their techniques. Conversely, I would like to have others help me out with my design techniques. As anyone can tell by looking at the state of my site (I’m working on it), my artistic skills are not quite up to scratch. Sure I can write CSS quite easily, and understand most CSS2 properties, I’m just a graphic artist.

30
June 18, 06h

Oops… Dave, You might want to fix that bug, it looks like those quotes (U+201C and U+201D) that were copied from cmcooper’s post, for some reason got converted to something else. This comments page no-longer validates, giving a UTF-8 encoding error.

It’s good have standard quotes (U+0022) and other characters converted appropriately, but real left and right quotation marks, or probably any character above U+007F, shouldn’t be converted to anything, as long as they’re valid UTF-8 characters.

31
June 18, 06h

“Pat:
Those are pretty skewed priorities, if you ask me. How do you define “browser compliance”? Whatever works in IE6? Whatever works in the most browsers possible? Whatever works in Mosaic? How is a designer supposed to know? Why should it be a priority at all? Let the designer design.”

When it comes to designing web pages, the designer MUST know about browser compliance. They should know about common bugs and workarounds.

Let’s assume they don’t. Let’s say someone is employed who creates a great-looking design in Photoshop. Here’s what they come up with:

1) The text sits on a transparent layer (to show the background) and the header uses a text shadow.

2) The columns have rounded borders and exactly 5 pixels of padding.

3) The company logo has a fading edge to it and must sit on many pages with different coloured backgrounds.

4) When the user hovers over a paragraph, it should light up with a different text colour.

5) Fonts, which will be set in pixels, must be enlargable for the boss’s father, who has poor eyesight.

Of course the website must work in IE6, the majority browser. But because the designer is in this case ignorant of browser compliance, his design will fail with every point I listed above! Eg:

1) Transparency only works in a few browsers so others will show a solid background. Text-shadow only works in Safari.

2) Rounded corners? Forget IE6. It can be done with images but not CSS. Padding? We all know about the problems with the box model in IE.

3) The company logo needs a PNG with alpha transparency. It won’t work in IE6 (unless a hack is used.)

4) Only links work with hover in IE6 so forget highlighting paragraphs.

5) Fonts set in pixels can’t be resized in IE6.

Time and again I’ve had to design with browser compliance foremost in mind when tackling a new layout. You can’t just design it on paper and wish for each browser to work - web design isn’t like that. Of course this is a major frustration.

Then there are a hundred other considerations any designer should know about. What if the site is viewed on a mobile phone or PDA? What if the user is colourblind? What happens if you turn off the images? Is the site still usable?

By “designer” you have to be both a coder and an artist. There is no way to be purely one or the other. Unless you have a two-person team, so one can do the coding, while one does the fancy gradients for the menu buttons. But even then they must work together. A dialogue must be ongoing as to what works on the web and what doesn’t. For the single-person team, the designer must know all these things. At the very least, they must know HTML and what tags work in what browsers.

32
paul says:
June 18, 06h

my biggest beef with standards is the crazy zealots who always seem personally offended if a finished project doesn’t completely validate or if there’s a table used for something that a table probably should be used for.

standards isn’t just a black & white jump, it’s a progression, so instead of all these people jumping down the throats of others because a project isn’t 100% yet, they should be appreciating that people are taking steps in the right direction, and if they have so much time to cut other people down, they instead offer their own knowledge as assistance.

that’s why i don’t read the comments on all of these “css awards” sites, because they’re all the same. people jump all over the winners code and turn it into a flame war.

it disgusts me a little sometimes that my own industry has become so petty and immature.

33
June 18, 06h

David S.,

> No browser manufactuer is going to stop displaying invalid markup in the near future.

Please don’t perpetuate this myth, it simply isn’t true. New browser versions can and do break on non-standard code where their ancestors did not. This goes back at least as far as Netscape 1.2, continues to present day Mozilla and Internet Explorer, and is set to go even further with Internet Explorer’s imminent update (granted, that one’s an HTTP issue and not an HTML issue, but the point remains).

I’m not saying that browser vendors are doing this purposefully, it’s just that when they update their rendering engines, they aren’t anywhere near as likely to catch regressions relating to broken HTML as they would with proper HTML. Common sense alone should tell you that, but history agrees.

34
David says:
June 18, 06h

I’ve had standards-compliant comment parsing for a few months now, thanks to some PHP sanitizing and Tidy. Here’s the post I wrote on it: http://wadny.com/news/during/2004/6/18/1036/

I do agree that a central site on pro-standards website design and coding would be useful, but I have a nagging feeling this already exists somewhere and just isn’t being used properly. Perhaps the W3C should have something like this on their website?

35
Jay says:
June 18, 06h

I think Eric Meyer says it best on his site today:

There’s only so much of the constant chest-heaving, garment-rending dramatics I can handle before I glaze over and start to closely contemplate my cuticles.

36
June 18, 08h

Rather than lumping all Validation errors together, and arguing about what will or will not go wrong with “invalid” pages, it might further the discussion a bit if we broke down validation errors into different classes, and discussed each separately.

In descending order of severity, my list would be:

1) Well-formedness errors.
The only good reason for using XHTML in preference to HTML4 is that it’s XML. And (the argument goes) there are all kinds of cool things you can do now, or will be able to do in the future, with XML content. (I’ve also read a lot of totally bogus arguments for using XHTML; this is the only one that holds any water.)

But if it’s not well-formed, it’s not XML, and you can’t do cr** with it. So the whole advantage of using XHTML has been lost.

(And, if, like me, you’re doing something that actually *requires* XHTML, the browser will throw a parsing error and not render your page at all.)

2) Errors that do not break well-formedness, but otherwise break some functionality of the page.

In the previous comment thread, I was royally, and *properly*, lambasted for having a page with two <div>s with the same id on it. Id’ed elements are link targets. If you have two elements with the same id, you break that functionality.

Similarly, many people, enamoured of nice typography have turned on smart-quoting. For obscure reasons, this frequently results in a profusion of invalid “garbage” characters on their pages. If the purpose was to achieve nice typography, and the result was a profusion of garbage characters, then I’d say you broke something along the way.

3) Errors which are still well-formed, and which don’t lead to any evident loss of functionality (at least with current browsers).

In this class, I’d put errors like placing inline content in a <blockquote> element. This was legal in HTML 2 & 3, but not in HTML 4 or in XHTML. For historical reason, it renders as intended in all current browsers (for documents declared to be HTML4 or XHTML 1.x). And it *probably* will continue to do so in future browsers.

This is the most troubling class of errors because you just don’t *know* whether it will break in future browsers. As Jim Dabell says, there’s plenty of historical precedent for that happening.

4) Errors that are actually *necessary* to achieve compatibility with some browsers.

In this class, I’d put things like using <object><embed></embed></object> to include multimedia content. The <embed> element is not part of any Standard, but it’s thoroughly necessary to support some commonly-used browsers. The really Standards-compliant browsers (or, more frequently, browser/plugin pairs) will pick up the multimedia from the <object> element, and will never attempt to render its content (which contains the invalid <embed> element). “Legacy” browsers don’t know what to do with multimedia in the <object> element, but do support multimedia via the <embed> element. So they are OK too.

This last type of error is, I think, the most defensible, as it is *necessary* to make things work today, and *unlikely* to break down in the future.

I know there are javascript tricks for hiding the <embed> element from the Validator, while still presenting it to the browser. The only ones fooled by such tricks are the Validator and the web designer who employs them. The browser – which is the thing we are supposed to be sending valid content to – certainly isn’t fooled.

A better approach would be to define a custom DTD which includes the <embed> element. This would satisfy the Validator, and would be an *honest* representation of the actual content of the page. But, since no existing browser uses a validating parser, supplying a custom DOCTYPE to existing browsers is a little pointless. Future browsers may well use validating parsers, in which case this becomes *the* way to go.

So there’s my list. From the truly bad to the probably necessary.

What’s yours?

37
June 18, 08h

I think chris hester said what I was going to better than I could have. When I made that list(comment 18) I wasn’t saying that designers have to look at browser compliance first, but rather that it’s more important the the design in the end. If a design I make doesn’t work in IE I have to make sacrifices. I have to make sure that everything is usuable in every major browser. If a designer cannot do this because lazyness or ignorance(alot of ignorance is laziness) then I wouldn’t hire them as a web designer. This really stems from design, since usage is really the most important part of any design.

I would not sacrifice design for validation if I had to make the choice. Luckily pretty much every design can be modified easily to be valid. Although in the workplace you can’t always make your html valid and that’s understandable.

38
Dave S. says:
June 18, 08h

Everyone please pause with this thread for a moment, and go read Jeffrey Zeldman’s ‘Production for Use’

http://www.zeldman.com/daily/0604e.shtml#use

Lest we lose site of the bigger picture.

39
David S says:
June 18, 09h

I think I speak for everyone when I agree with Dave that we are all sick of this back and forth. It sucks. It doesn’t get anyone anywhere.

From what I’ve seen, there are three camps (warning: generalization!):

1. There are people from the “old school” who don’t want to change the way they work and are basically making themselves obsolete.

2. Then there are those who see standards as the end-all solution to the semantic web woes — good thought, but it isn’t true, and shouldn’t be treated as so. A lot of these folks are the “I can validate my blog, so standards are it” kinds of folks.

3. Then there are the practical professionals who have been around long enough to be adept at old school and standards-based techniques and view the standards as another tool in their toolbox. Standards are a huge help to those that know how to use them and are a preferred method in most cases. Being able to discern between “most cases” and “all cases” is one of the major differences between camps 2 and 3.

Of course, this is an over-simplification, and I’m sure to piss someone off in the process. But I’m tired of the standards “nazis” and the “my way or the highway” mindset. I definately group myself in the third camp. I always use standards as much as I can, whenever I can, but sometimes there comes a point where they aren’t the end-all solution we wish they were. At that point, if you can make the right decision, be it using standards or not, you’ve come a long way.

(Sidepoint: The whole if-you-don’t-validate-your-page-will-stop-showing-up argument is a bit bogus. No browser manufactuer is going to stop displaying invalid markup in the near future. Twenty years from now, that might be the case, but who knows if HTML will even be in use by then.)

40
Susanna says:
June 18, 11h

Tom: “What we need is a designer/code forum wiki, whatever.”

Lo these many years ago, there was a great forum like this on CNet called Builder Buzz. But it sucked resources without providing enough ROI and so it was axed thusly. Undaunted, the Buzziens took the spirit of that forum and rebuilt a new one from scratch, with PHP, and dubbed it Hiveminds. It’s still around, and I still find it immensely useful. Since I am the only designer at the place where I work, the Hiveminders are the peers I can bounce ideas off of, and ask questions. http://forums.hiveminds.info

41
June 18, 11h

When I joined the Macsanomat team (Macsanomat is the Finnish Mac news site: http://macsanomat.fi/ ), I started by retrofitting the in-house CMS with syntactic correctness. In order to prevent the editorial team from entering bad markup into the system, I wrote an ad hoc syntax checker that accepts a subset of HTML. Code is available at http://iki.fi/hsivonen/HTMLSyntaxChecker . (Yes, I know the code is ugly. It was my first PHP project and I hadn’t had education in compiler technology at the time.)

It’s the 2000s, but PHP4 lacks Unicode support by default. I’ve adapted the UTF-8 code of Mozilla into PHP. Code: http://iki.fi/hsivonen/php-utf8/ Instances of deployment: http://macsanomat.fi/lite/ and http://macsanomat.fi/atom

However, despite the pointers to PHP code above or, rather, because of my experience with PHP, I recommend you stay away from PHP4 for new projects if you want to produce proper HTML or XML. The PHP model of mixing it program code and HTML literals is not the way to go if you’re aiming for correct markup. The crucial Unicode infrastructure is missing. The XML tools that are available by default and with reasonable effort are woefully inadequate.

(Re: WordPress: Despite the hype, WordPress does not give you 100% standards compliance nor protect you from the problems of PHP. Just last week I of observed wordpress.org serving ill-formed markup to Firefox as application/xhtml+xml causing an error message to appear in place of content.)

I recommend using a development platform with solid a Unicode and XML infrastructure in place. (I hear PHP5 fixes everything. In the meantime, I’ve used a Java. However, I’m explicitly not recommending JSP.) In order to get rid of mis-nested tags and under or over-escaping, I recommend building a document tree (eg. using the DOM) and then serializing it. (With this approach, it is a good idea to design for caching from the beginning in order to avoid excessive repetitive tree building.)

Incidentally, I happen to have a related write-up: http://iki.fi/hsivonen/cms/te.html

Those who read Finnish may also be interested in: http://www.hut.fi/u/ykarikos/koulu/webui.pdf

42
June 18, 12h

Steve said:

“I think a few people should sit down and create an PHP or Python application which converts all this junk to standards-compliant XHTML. Anyone interested?”

Someone IS interested Steve, the PHP development team. PHP 5.0 is being bundled with the HTML-Tidy ( http://www.w3.org/People/Raggett/tidy/ ) plug in that catches and fixes exactly this kind of problem.

Tie that in with the built in XML parser modules in PHP 5 and you have the basis for a robust, standards friendly cruft filter.

If you’re not running a PHP5 set-up, HTML-Tidy can be plugged into a number of other tools. I used it daily (hourly?) as part of my development process using my editor of choice HTML-Kit ( www.chami.com ).

43
June 19, 01h

> Future browsers may well use validating parsers, in which case this becomes *the* way to go.

I don’t believe in browsers switching to validating XML processors. Compare the size of the XHTML + MathML DTD to this size of your usual blog post. The DTD with all its modules is huge. Retrieving it and parsing it would be a significant performance problem. The whole point of introducing the concept of well-formedness was to relieve applications like browsers from the burden of parsing the DTD.

I think including a doctype declaration in application/xhtml+xml content served on the Web is mostly pointless. For checking compliance with the rules of the XML vocabulary, out of band information about the Relax NG schema or DTD can be provided to the validator. The browsers don’t care.

(Note: Mozilla cheats with the XHTML+MathML DTD in order to make it look like it has parsed the DTD. It does not parse the full DTD, though.)

44
Serge M. says:
June 19, 03h

I haven’t seen one comment mentioning Interactive Tools’ htmlArea. I use this on my site, and it produces valid XHTML Transitional, hoorah!

http://www.interactivetools.com/products/htmlarea/

I’ve got it linked to an access database to store the pages and ASP to retreive content.
Yes, urls cuold be better but still, it was damned easy to write!

45
mark says:
June 19, 03h

it may be unrelated, but this makes me think of a recent conversation where i had to point out that even the RNIB (Royal National Institute for the Blind) web site uses images as text

46
June 19, 05h

“Is a custom DTD only possible with XHTML?”

No, but XHTML 1.1 is *modular*, so a custom DOCTYPE which adds (say) the <embed> element to XHTML 1.1 requires only a few lines.

“If so that would make sense for Mozilla to do that as XHTML documents are supposed to be valid XML.”

I don’t think a “XHTML 1.1 +Embed 1.0” DOCTYPE should cause Mozilla to throw a parser error when it encounters an &mdash; in the document.

That’s what it does now, because it *doesn’t know* what entities are defined in a custom DOCTYPE like the one above, and it can’t parse the DTD to find out.

47
June 19, 06h

“The DTD with all its modules is huge. Retrieving it and parsing it would be a significant performance problem.”

1) For “known” DTDs, you would not have to download them, merely use a local copy. (The Validator does this, too. A complete set of “known” DTDs is a few megabytes.)

2) With the Modularization of XHTML, a DTD which loads the XHTML 1.1 DTD and adds the <embed> element would only be a few lines long. Not a big burden to download and, like CSS files, etc, something you could cache locally, for subsequent reuse.

3) Maybe *parsing* the DTD is a performance-hit. But it’s better than Mozilla’s (I haven’t tested other browsers) current behaviour of treating a document with a custom DOCTYPE as a generic XML document and therefore throwing a parsing error if it contains HTML entities (for instance).

48
Jim says:
June 19, 09h

I find that the SmartyPants/MarkDown combo for MT works fine. No validator probs in my comments (thus far).

49
June 19, 12h

David S wrote: “3. Then there are the practical professionals who have been around long enough to be adept at old school and standards-based techniques and view the standards as another tool in their toolbox.”

I’m not sure I can agree with your generalisation there. Standards shouldn’t be seen as a “tool” - in other words, an optional entity you either choose to use or don’t. Surely what Zeldman et al have been banging on about for years is that standards should be the ONLY way ahead. Use them or be left behind.

I’m aware from the comments above that this is not totally practical with today’s browsers, so non-valid hacks are still required. (I sometimes use ‘nolayer’ myself to stop Netscape 4 seeing some HTML, even though it’s not valid.) But designers should be using as much of the standards as they can.

The reasons are obvious. Future internet devices will (hopefully) require standards-compliant code by default. By doing what we can today, we can prepare for any new device imaginable, simply by following the W3C standards. If we don’t, we’re really designing for the desktop PC running Netscape or IE versions 4.

What’s more, ignoring standards gives you code that is a nightmare to adapt to a new device. With valid XHTML, it’s a breeze, as the hardwork can be done via CSS, assuming the new device handles that. Even better is to use XML and output your code as required.

Sadly I agree when people say that browsers will continue to serve invalid markup for years to come. Imagine a library in 2024 that can’t access pages twenty years old that were written in HTML 4. It would be crazy to deny masses of content by only displaying valid pages. What we might see though is XHTML 2 browsers that require a plug-in to view older sites. Or people might use PHP and write special parsers to view them. Who knows?

I doubt current markup will last as it stands though - it’s just too limiting. Hence the need for custom DTDs and XML with XSLT or a similar functionality.

Jacques Distler wrote: “But it’s better than Mozilla’s (I haven’t tested other browsers) current behaviour of treating a document with a custom DOCTYPE as a generic XML document and therefore throwing a parsing error if it contains HTML entities.”

Is a custom DTD only possible with XHTML? If so that would make sense for Mozilla to do that as XHTML documents are supposed to be valid XML.

50
June 20, 01h

“Compare that with the download size of Opera and Firefox for Windows. The DTDs are absurdly large (even if you remove comments).”

% ls -l sgml-lib.tar.gz mozilla-mac-MachO.dmg.gz
-rwxr-x— 1 distler staff 15413257 Jun 15 19:08 mozilla-mac-MachO.dmg.gz*
-rw-r—– 1 distler staff 3063146 May 6 18:40 sgml-lib.tar.gz

20% of the size of the (current) Mozilla download itself. Presumably *much* smaller if you remove the comments, and various redundant or obsolete files (eg, we presumably are only talking about XHTML (and related XML) DTDs). I’m gonna guess maybe a 4-5% increase in the size of the Mozilla download.

“It follows from the XML spec that entity references are inherently unsafe for Web documents, because non-validating XML processors are allowed not to expand them and someone may be using a non-validating XML processor to parse the content you serve on the Web.”

The same would be true if we were handling local files. I don’t think the fact that the documents are delivered over HTTP has anything to do with the matter.

“Since entities are inherently unsafe for the Web, the big bug is that a set of entities was included in the XHTML DTDs, which confuses authors into expecting them to work.”

Content authors are going to laugh at you if you tell them, “no named entities in your web documents; use numeric entities or UTF-8 characters *only*. Doesn’t matter if the Spec allows them.”

That’s right, folks, no “&nbsp;” or &copy;, you need to write “&#160;” and &#169; instead.

And MathML documents are hard enough for humans to read already. They would be *perfectly* illegible if you had to write numeric entities, like &8747; and &8750;, for all your symbols, instead of named entities, like &int; (or &Integral;) and &ContourIntegral;.

“But what’s the point? Declaring some new stuff in the DTD does not add any capabilities to the browser.”

Why include a DOCTYPE declaration at all, then? After all, the capabilities of the browser are fixed, regardless of what DOCTYPE you declare, or whether you declare one at all.

You’ve given one answer already, to do with how the document is ultimately parsed.

Moreover, the capabilities of the browser are not totally fixed. They can be augmented via plugins (like the MathPlayer plugin for IE/Win). Say someone writes a “fooML” plugin for Mozilla. We should be able to start serving valid “XHTML + fooML” or “XHTML + MathML + fooML” documents to Mozilla equiped with the fooML plugin.

XHTML and MathML declare named entities (I’m not sure about fooML :-). These should not cause Mozilla to throw a parsing error.

I don’t really know (or care) whether using a validating parser is the “solution” to this problem – though, clearly, *something* has to do the job of expanding the named entities. But, if the Modularization of XHTML is to be anything but an utterly useless curiosity, content authors ought to be able to declare new compound DOCTYPEs and serve such documents – named entities and all – to capable browsers.

51
June 20, 01h

“Compare that with the download size of Opera and Firefox for Windows. The DTDs are absurdly large (even if you remove comments).”

% ls -l sgml-lib.tar.gz mozilla-mac-MachO.dmg.gz
-rwxr-x— 1 distler staff 15413257 Jun 15 19:08 mozilla-mac-MachO.dmg.gz*
-rw-r—– 1 distler staff 3063146 May 6 18:40 sgml-lib.tar.gz

20% of the size of the (current) Mozilla download itself. Presumably *much* smaller if you remove the comments, and various redundant or obsolete files (eg, we presumably are only talking about XHTML (and related XML) DTDs). I’m gonna guess maybe a 4-5% increase in the size of the Mozilla download.

“It follows from the XML spec that entity references are inherently unsafe for Web documents, because non-validating XML processors are allowed not to expand them and someone may be using a non-validating XML processor to parse the content you serve on the Web.”

The same would be true if we were handling local files. I don’t think the fact that the documents are delivered over HTTP has anything to do with the matter.

“Since entities are inherently unsafe for the Web, the big bug is that a set of entities was included in the XHTML DTDs, which confuses authors into expecting them to work.”

Content authors are going to laugh at you if you tell them, “no named entities in your web documents; use numeric entities or UTF-8 characters *only*. Doesn’t matter if the Spec allows them.”

That’s right, folks, no “&nbsp;” or &copy;, you need to write “&#160;” and &#169; instead.

And MathML documents are hard enough for humans to read already. They would be *perfectly* illegible if you had to write numeric entities, like &8747; and &8750;, for all your symbols, instead of named entities, like &int; (or &Integral;) and &ContourIntegral;.

“But what’s the point? Declaring some new stuff in the DTD does not add any capabilities to the browser.”

Why include a DOCTYPE declaration at all, then? After all, the capabilities of the browser are fixed, regardless of what DOCTYPE you declare, or whether you declare one at all.

You’ve given one answer already, to do with how the document is ultimately parsed.

Moreover, the capabilities of the browser are not totally fixed. They can be augmented via plugins (like the MathPlayer plugin for IE/Win). Say someone writes a “fooML” plugin for Mozilla. We should be able to start serving valid “XHTML + fooML” or “XHTML + MathML + fooML” documents to Mozilla equiped with the fooML plugin.

XHTML and MathML declare named entities (I’m not sure about fooML :-). These should not cause Mozilla to throw a parsing error.

I don’t really know (or care) whether using a validating parser is the “solution” to this problem – though, clearly, *something* has to do the job of expanding the named entities. But, if the Modularization of XHTML is to be anything but an utterly useless curiosity, content authors ought to be able to declare new compound DOCTYPEs and serve such documents – named entities and all – to capable browsers.

52
June 20, 12h

“1) For ‘known’ DTDs, you would not have to download them, merely use a local copy. (The Validator does this, too. A complete set of ‘known’ DTDs is a few megabytes.)”

Compare that with the download size of Opera and Firefox for Windows. The DTDs are absurdly large (even if you remove comments).

“2) With the Modularization of XHTML, a DTD which loads the XHTML 1.1 DTD and adds the element would only be a few lines long. Not a big burden to download and, like CSS files, etc, something you could cache locally, for subsequent reuse.”

But what’s the point? Declaring some new stuff in the DTD does not add any capabilities to the browser.

“3) Maybe *parsing* the DTD is a performance-hit. But it’s better than Mozilla’s (I haven’t tested other browsers) current behaviour of treating a document with a custom DOCTYPE as a generic XML document and therefore throwing a parsing error if it contains HTML entities (for instance).”

Parsing the DTD certainly is a performance hit. It follows from the XML spec that entity references are inherently unsafe for Web documents, because non-validating XML processors are allowed not to expand them and someone may be using a non-validating XML processor to parse the content you serve on the Web. Since entities are inherently unsafe for the Web, the big bug is that a set of entities was included in the XHTML DTDs, which confuses authors into expecting them to work.

“I don’t think a “XHTML 1.1 +Embed 1.0” DOCTYPE should cause Mozilla to throw a parser error when it encounters an &mdash; in the document.”

I agree. I think Mozilla should display a place holder for unexpanded entities. (Safari displays a place holder. Opera silently ignores unexpanded entities.)

“That’s what it does now, because it *doesn’t know* what entities are defined in a custom DOCTYPE like the one above, and it can’t parse the DTD to find out.”

Not quite. Safari and Opera do not throw a parsing error. Their copies of expat merely inform the application about the and expanded entity as per
http://www.w3.org/TR/REC-xml/#include-if-valid
The applications choose not to treat this condition as a fatal condition.

Mozilla, on the other hand, feeds a zero-length stream to expat as the DTD. Hence, expat thinks it has seen the DTD but the DTD didn’t contain a declaration for the given entity. Therefore, expat treats the condition as a fatal error as per
http://www.w3.org/TR/REC-xml/#wf-entdeclared

The reason why Mozilla does this is that Mozilla’s copy of expat is always configured to resolve external entities, because the entity facility is used for localization of local XUL files.

From the authoring point of view, the safe way to use an em dash is to use the proper UTF-8 byte sequence.

53
June 21, 02h

Apologies first off, as this is going to seem a bit like spam, just thought it’d be nice to try and offer a place for people to post their articles about some of the code snippets and tools that seem to be coming through.

http://www.cre8asite.net is a web dev niche directory run by the same people as www.cre8asiteforums.com. Its meant to be a place for specific articles rather than websites as a whole. http://www.cre8asite.net/browseCats.asp?category=14 is the PHP Code Snippet category for example.

If anyone feels so inclined, feel free to submit to it. As others have suggested, it would be nice to have a central resource for these kinds of things. Could be a blog post, an article, whatever. If not, nevermind :D

Oh and spot on linking to Zeldman’s blog posting Dave. Nice reminder that to keep the big picture in mind, I’d jsut used it for exactly the same purpose somewhere else :D

54
June 21, 03h

The first comment talks about Designers and Coders/Developers, How do you realise/decide what one you are?

Can you be both, whats your title? :-)

55
June 21, 09h

We’re so off topic I try to be brief.

“mozilla-mac-MachO.dmg.gz”

That’s Mozilla for OS X, not Firefox for Windows.

“The same would be true if we were handling local files. I don’t think the fact that the documents are delivered over HTTP has anything to do with the matter.”

Hopefully, you can choose the software on your local machine. You can’t choose the software other people use.

“Doesn’t matter if the Spec allows them.”

The XHTML and MathML specs can’t redefine what XML is. Just like the HTTP spec can’t redefine TCP.

“That’s right, folks, no ‘&nbsp;’ or &copy;, you need to write ‘&#160;’ and &#169; instead.”

You could press option-g to type a copyright sign. A lot easier. I am not suggesting authors use numeric character references. I am suggesting they use UTF-8.

“Why include a DOCTYPE declaration at all, then? “

No good reason as far as I can tell.

“I don’t really know (or care) whether using a validating parser is the “solution” to this problem –? though, clearly, *something* has to do the job of expanding the named entities. “

I suggest your editor or CMS do it.

56
Matthijs Aandewiel says:
June 21, 11h

To continue on Josh Bryant’s comment about the fact that there is not a single standards forum, I think we should open an IRC Server, IRC Server’s are great for passing on Knowledge, all I learned on mastering CSS, HTML, XHTML, PHP and MySQL comes from IRC Chat rooms (And a bit from Jeffrey Zeldman’s Orange Book ;))

57
June 22, 02h

Jacques Distler wrote: “And there’s no point in assigning a DOCTYPE to your document, because the client isn’t going to do anything with that DOCTYPE declaration anyway.”

There is a reason for including a doctype… It’s called doctype switching. How else are modern UAs supposed to determine whether to render in quirks or standards compliance mode. (or, in IE’s case, quirks and *more quirks* mode.)?

When a document is served as application/xhtml+xml, AFAIK, Mozilla will always render in standards compliant mode with or without a doctype, but if you deliver that same document (assuming XHTML 1.0 Strict) as text/html, then both IE and Mozilla will need the doctype to determine that it should attempt to render in standards mode.

58
David says:
June 22, 05h

Stef, the phrase you’re looking for (instead of “ergonomy”) is “human factors.” Ergonomics deals specifically with physical interaction–the shape of the keyboard and mouse, for instance, whereas human factors deals with the mental/visual/auditory aspects. The general rule is everything below the neck is ergonomics, and everything above the neck is human factors. Generally, websites can’t affect ergonomics, but they most definitely will affect human factors.

59
David says:
June 22, 07h

I wrote up another big article on this whole issue, which may or may not make sense to the rest of you. It’s at http://wadny.com/news/during/2004/6/22/1145/

I tied in this discussion on a central repository for standards-based design to the issue of the industry as a whole, which sort of relates to what Dave wrote about in the first place–designers vs. coders–in that a more mature industry can deal with specializations and develop processes for letting people with different skill sets work together.

60
June 22, 09h

Jacques Distler wrote: “Have I got that right?”

Yes, except I still disagree on the absolute need of entities for no-break space and integrals. It is an input method issue, not a wire format issue.

Earlier, I posted a link to a template engine paper, which contained quite a few no-break spaces. Those were written in OpenOffice without typing entity references. Eg. Mathematica offers an input layer for integrals.

Lachlan Hunt wrote:

“There is a reason for including a doctype… It’s called doctype switching.”

What I said about doctype was in the context of application/xhtml+xml. I am quite aware of doctype sniffing on the text/html side. (See http://iki.fi/hsivonen/doctype ;-)

61
June 22, 09h

Henri Sivonen wrote:

“We’re so off topic I try to be brief.”

Yes, this is rather off-topic, but does speak to the “bigger picture” of why we’re interested in XHTML in the first place.

I, too, will try to be brief.

You point out that some named entities (e.g. &copy;) have keyboard equivalents, and so authors could just as easily type the corresponding UTF-8 character. Obviously, however, the keyboard is limited, and whether it’s &nbsp; or &ContourIntegral;, authors need to be able to input named entities.

You say, “Fine. Have your CMS replace those with numeric entities or their UTF-8 equivalents, upon ‘publishing’ the document.” (Numeric entities are a little safer, as a naive decode-entities() would replace &amp; with & in the output, which is *not* what you want to do. I don’t know of a commonly-used tool that keeps the XML-safe entities, like &amp; and &lt;, unexpanded while expanding everything else. Maybe someone should write one.)

In essense, you should assume the client is using a lowest-common-denominator non-validating XML parser, and is ignorant of the named-entities defined in whatever XML dialect your document is written in (even those defined in vanilla XHTML). Expand those named entities before sending the document.

And there’s no point in assigning a DOCTYPE to your document, because the client isn’t going to do anything with that DOCTYPE declaration anyway. (The only “client” which can be assumed to be using a validating parser is the Validator and, as I have argued, you should be automatically validating your document before publishing anyway.)

Have I got that right?

62
s t e f says:
June 22, 12h

I’m a big fan of the middle-way solution, so to speak.

We’ve just redone quite a big corporate site for the company I’m working in, and we came to several no-go’s, validation-wise (like embedding Flash, although I *know* there are valid ways to embed it, or frames, although it’s not a validity problem but a usability/accessibility issue).

In the end we mitigated a lot of what we wished for, and went for a middle-way that would more or less satisfy all the actors on the project: functional aspects (it’s a web application), aesthetical and ergonomical aspects (is ‘ergonomy’ an english word? if not I mean the same as ‘usability’), validation aspects, accessibility aspects.

I’ve been evangelising a lot in the company about accessibility and good markup, which go together. I’m always prone to say that a well thought-out, valid markup is already 80% accessible.

Yet in the end, let us not forget a few things:
- real-life website development is never ideal, because you don’t have an infinite amount of time to do the works, and it’s never done in a hermetic cell of ieal, pure markup (think heteregeneous sources, multiple actors, more than five Java teams, etc).
- the web is only the web, and as much as I love it and spend most of my waking hours thinking about it, it’s *only* one of many technological artifacts of modern life.

So all in all, I never take part in flame wars, although I sometimes have strong feelings to defend one point or the other. In the end, it’s *just the web*, and let’s be honest about it, there are many more serious subjects to get all hot about. Politics, etc.

It’s a matter of being reasonable, of knowing that what one does is not perfect, will be better next time hopefully, and peace to all men and women of good will. ;)

Let’s take a practical example: I wrote a little library that rewrites spip’s html generator so that the content validates (side note : spip is a CMS, see http://www.spip.net/ ). Yet this rewriter *voluntarily* doesn’t take into account a few aspects, like “&” entities in URLs. I’ve seen a lot of arguments *pro* entity-encoding, but no one tested if the result was still usable in, say, Google. I tested, saw that encoded “&” entites did not work in Google, and decided to leave it at that.

As a result, a few geeks every now and then write and complain that my home page is not valid, yet they don’t even read the validator’s details or try to understand why.

In the end I’ve become a contributor to the spip community, and had a chance to discuss a lot with the founders about why they chose i and b in lieu of em and strong, and we came to an agreement that in this particular situation it’s the better solution. Sane, adult discussion is so much nicer. One feels more intelligent in the end, if only because of brain osmosis ;)

The main lesson I drew from production-size web development is exactly that: try to understand, tolerate, expect better things in the future.

And most of all: open your window, draw a big puff of fresh air, and never *ever* lose your temper :)

63
June 23, 05h

Hi Folks,

we should share, so lets share a little bit. :-)
I searched for a little script that take my images and present them in a nice and simple way.

I fount the slideshow script of Justin Blanton (http://justinblanton.com/projects/slideshow/), and edit it a “little” bit.

##Features:
- simple installation
# one (main) directory for the enlarged images
# one subdirectory for the thumbnails (if needed)
# one php file in the main directory
# edit the php and the css file
- you get a slideshow
- you can use thumbnails if you want
- put in a little tip a friend form
(with out a form processor)
- xhtml strict without tables
- you can switch ON/OFF some litte features
- css is used to apply the look

A demo of the latest version is on http://xi.pair.com/dario77/test2/

A demo of a older version with a different css in on
http://xi.pair.com/dario77/test/

### Download:
http://xi.pair.com/dario77/test2/asg1.1.zip
###

The form is not working, because there is no form processing installed. (Should be no problem for you)

This is not the final version, I have to clean it up, maybe more css templates…

If you like it, use it.

Greetings from Austria

Rene

64
June 23, 06h

Henri Sivonen wrote:

“Yes, except I still disagree on the absolute need of entities for no-break space and integrals. It is an input method issue, not a wire format issue.”

Absolutely. Just as you *insist* that we not make assumptions about the capabilities of the client and ensure that our document works in a lowest-common-denominator non-validating XML parser, I insist on the ability — in a pinch — to open up that document and edit it in “vi”.

There’s a tension between those two goals, but that’s life.

As to the question we started with: what should one do with an XHTML 1.1 document containing the <embed> element? The bottom line is there seems to be no point in defining a compound “XHTML 1.1 + Embed 1.0” DOCTYPE to send the document with. You have argued (persuasively, at least to me) that one should simply send it out with an “XHTML 1.1” DOCTYPE, or with no DOCTYPE declaration at all.

That not only works best with existing browsers (as I said above), but will continue to work best with future browsers in the forseeable future.

To heck with what the W3C Validator might say.

65
s t e f says:
June 23, 12h

David,

Thanks for the precision. In french ‘ergonomics’ deals with usability, and extends from the physical artifacts to the metaphorical (like human-computer interface, etc), so I thought it was the same in english.

Mezzoblue, the site where people learn new words ;)

66
amd says:
June 30, 08h

I’ve written written a XML syntax checker which checks whether the input is has correct syntax. Check it here:
http://dotgeek.org/highfive/display.php?snippet=25

67
amd says:
June 30, 08h

I’ve written written a XML syntax checker which checks whether the input is has correct syntax. Check it here:
http://dotgeek.org/highfive/display.php?snippet=25

68
July 02, 10h

Jacques Distler wrote:

“There’s a tension between those two goals, but that’s life.”

There’s a tension between XML and apps that are not Unicode-savvy.

“The bottom line is there seems to be no point in defining a compound “XHTML 1.1 + Embed 1.0â€� DOCTYPE to send the document with.”

I’d like to emphasize “send”. Such a DTD could be used for validation within the CMS. However, when validation is internal to one’s CMS, one might as well upgrade to Relax NG.

“You have argued (persuasively, at least to me) that one should simply send it out with an “XHTML 1.1â€� DOCTYPE,”

I don’t think I did. At least I did not intend to.

“or with no DOCTYPE declaration at all.”

That’s my suggestion for application/xhtml+xml.

For text/html, there’s a need for a doctype that yields the desired layout mode. (Cf. the WHAT WG form draft.)

69
July 03, 03h

Well, this is vey definitely a work in progress, with much more on the way (nesting and attribute checking is next on the list), but at http://www.ilovejackdaniels.com/PHP/On-The-Fly_Validation I’ve put together (and made public a few days ago) a system for automating the fixing of some common validation errors. It takes care of ampersands, closing slashes where needed, and a few other tricks. It also replaces <b> with <strong> and <i> with <em>, but only because I despise <b> and <i> tags …