The trouble of writing a standards compliant website

One of my tasks at work is to write, enhance and maintain a small website for my boss. Having been given free reign, I—of course—decided to host it on a LAMP server. No trouble here. Not wanting to use outdated technology that would require extensive rewriting after a few years, I decided to stick to standards—and I learnt XHTML 1.1.

Break a leg, or break a page.

Of nice standards

XHTML 1.1 is the latest, W3C-recommended web publishing language there is. It retains the nice parts of HTML, fuses it with the modularity of XML 1.0, requires you to separate content, presentation and interaction, and provides ways for one to rapidly create a page that will look the same regardless of the web browser you’re using. In theory.

Of browsers, standards compliance and bragging rights

Right now, there are roughly four browser families representing most of the browsers used to access the web.

  • Trident-based: Internet Explorer and skinned mshtml browsers. Most used. Most buggy. Most standards abusing.
  • Gecko-based: Mozilla, Firefox, Seamonkey, Camino, and others. Sticks to standards mostly, but knew countless versions and revisions.
  • Opera: used on PCs sometimes, on mobile devices mostly: quite standards compliant in general, does have a few annoying quirks.
  • KHTML: Apple Safari and Konqueror. Tries to comply with standards, but is still mostly an HTML 4.01 engine.

Of those, only two really understand XHTML 1.1: Gecko and Opera. Both include an XML parser, and generate errors if the page is badly written. Firefox’s source code viewer is really nice.

Of assuming one’s mistakes

As a meticulous designer, I wrote the code entirely by hand, discarding entirely the idea of using a WYSIWYG editor—as such, any error in the code has to be corrected by yours truly.

When the time came for me to test and validate my website, I had the horrible surprise of getting slightly different results in Firefox and Opera (easily fixed by reordering my fonts priorities), unworkable content in Konqueror (which required a slightly more extensive rewrite) and...

A download prompt in Internet Explorer.

Of MIME types, HTTP headers and headaches

You will, of course, tell me this: with XHTML, you need to use the application/xhtml+xml mimetype (automatically generated by Apache if your file ends in .xhtml, or that you can set up with a dedicated php command for .php files). If the browser doesn’t mention it supports this one, you can fall back to application/xml or text/xml. If it gets you no cookie, you can send different content with a more classical text/html mime type.

However, Konqueror and above all Internet Explorer lie about their capabilities: Konqueror will open the file and try to process it as regular HTML (which may succeed or fail), Internet Explorer will merely ask you to save the file to disk... so that you can open the page with a standard-compliant browser, perhaps.

If you look at what IE is supposed to support, you’ll get: “/”. Meaning it is supposed to support any kind of file (which, of course, it can’t). Meaning that, if you want to provide a scaled down version of the page, you’ll need to do browser detection—making mimetypes irrelevant, but highly annoying with those browsers that do support it nicely!

Konqueror is a bit more truthful: it says that it supports “/” but that it supports text/html first; for application/xhtml+xml, it can try opening it as HTML 4, or start another browser which registered the application/xhtml+xml mimetype if available. This can be used to set up a script that would, for example, replace all instances of the id= argument with name=—which would make image maps work, for example. Badly applied CSS on too deeply nested DIVs, radically different font scaling and more can make or break a page.

Opera 9 has an integrated XML parser on par with Gecko’s (it wasn’t the case in 8.x); it supports application/xhtml+xml quite well apart from a few bugs (like image maps, which require the older syntax to work) and it displays the page without requiring too many tricks.

Gecko is a big winner here: to support all versions starting from Mozilla 1.0rc3, I merely needed to use one single trick. What a relief.

What you may end up doing

The easy way out would be to reprogram the page in HTML 4.01. Another one would be to force IE to support XHTML 1.1 (yes, it’s possible! It requires a dozen additional lines of code and wastes even more bandwidth when it downloads the W3C DTDs, and then it renders the page in Quirks mode, but it does accept application/xml and parse the XML code) and use only those tags all browsers recognize, or then... Do as I do, program the pages in XHTML 1.O Strict following Appendix C, default mime type to text/html, and detect those few browsers that understand application/xhtml+xml and send them the higher level mimetype.

Darn it, being a responsible web developer is hard sometimes...

License

Verbatim copying and distribution of this entire article are permitted worldwide, without royalty, in any medium, provided this notice is preserved.

With thanks to Perth Counselling to keep FSM editors sane.