Before reading further... Are you looking for great Linux hosting from a company that cares about GNU/Linux? Pick Dreamhost hosting, get a 10% bonus to the disk space (and support Free Software Magazine in the meantime!)
Creating web pages, the right way
Web scripters: take care of your pages' code!
Download the whole article as PDF
- 2008-07-01
- User space | Intermediate
-
Write a full post in response to this!
Have you ever felt that warm fuzzy feeling of knowing that your code is error-free and complies with the latest standards? In terms of programming skill, web authors are too-often seen as the bottom of the barrel (you will notice I didn’t call them ‘web programmers’) due to the apparent forgiveness and limitations of the platform. However, they are required to cover a large array of programming expertise and, even worse, they must ensure that their code runs the same on various platforms–something “real” programmers consider a challenge.
The “bottom of the barrel” indeed!
During the first browser war, from 1997 to 2001, being a web coder was interesting, with the gap between Internet Explorer (IE) 4 and Netscape Navigator 4. The former used “advanced” CSS (how things have changed!) while the latter had interesting scripting capabilities (but its frequent releases were so quirky that they discouraged developers from using them).
But when Netscape was sidelined, web authors were required to write for a single platform (IE) which had a proprietary model–and this sad state of affairs lasted for half a decade. During that time, web coding required you to know very little: Visual Studio (or Frontpage, or Dreamweaver) and the twisted view of the Internet that IE 5/6 gave you.
Recently though, due to the rise of “alternative” browsers (such as KHTML/Webkit-based Safari, Mozilla Firefox, and Opera), writing code only for IE would make any website automatically loose 20% (or more) of the market; and this trend isn’t reversing (even IE, with version 7, is getting slightly more standards-compliant). So web authors must change their habits and code for more platforms. And what’s common between most of these alternatives? Conformance with standards.
Which, authors find out, are quite a challenge to use correctly.
Now, some web authors may try to keep creating “tag soup”. It is perfectly possible to output atrocious code that all current rendering engines will treat the same way–or choke on similarly. However, one may very soon find out that actually creating “good” code is much less of a hassle than tinkering with “tag soup” is.
It is perfectly possible to output atrocious code that all current rendering engines will treat the same way–or choke on similarly
Creating good mark-up
The basis of a web page still is hypertext mark-up language (HTML). It has seen several variations, with HTML 4.01 and XHTML 1.0 (each further divided into Transitional, Frameset, and Strict versions) being the most interesting to look at. The main difference between these two is that XHTML 1.0 is equivalent in functionality to HTML 4, but makes use of XML syntax instead of HTML.
Moreover, while HTML 4.01 doesn’t care about the case of tags and parameters, XHTML is case sensitive: all tags and parameters must be in lowercase. In the present document, when talking about a specific tag, I will write it uppercase for easier reading. However, they will be written in lowercase inside code samples.
The three sub-divisions are the same for both languages:
Transitional takes HTML 3.2 (the Frankenstein’s Monster of web specifications) and tries to give it some order: several duplicate functionalities are removed, and some parameters are unified. It is also the first HTML specification that allows you to build a standardized Document Object Model (DOM).
Frameset removes most of what made HTML 3.2 a Frankenstein language: anything that is geared towards making HTML something other than a mark-up language was removed (specific style options, tags that required actions from the browser, etc.), apart from frames. Those used to be useful for early “dynamic” websites, and were kept for a time.
Strict is similar to Frameset, but removes frames from the specifications. Indeed, they are actually redundant and harmful since frames break navigation models and can be advantageously replaced with more generic OBJECT tags (except that those are still not correctly supported under IE 7).
Strict versions of XHTML and HTML are the easiest to learn, and the most similar–but I would personally recommend the use of XHTML 1.0 Strict for authoring your pages because several web browsers are, now, able to notice incorrect code and report it directly.
Once you are done, you’ll be better off (at least for now) with HTML 4.01 Strict–which requires very little in terms of modification over XHTML 1.0 to validate: indeed, making XHTML code into valid HTML code merely requires slashes (/) to be removed from self-contained tags (like BR), and image maps (rarely used nowadays) to change their syntax slightly (and this was, actually, an error when establishing XHTML syntax–most browsers today use the HTML 4.0 syntax for image maps on XHTML 1.0 documents anyway).
For example, while in HTML 4.01 a horizontal ruler (HR) could be written as <HR> or <hr>, in XHTML 1.0 it must be written as <hr/> (or <hr />; note that extra space, so that browsers like IE 4 or Netscape 4 can still understand it).
Validation, why it will help you
What is validation? Well, it is ensuring that no tags are used that don’t belong into the document type. Here follow a few notable examples:
FRAME is no longer allowed in Strict documents; this tag allowed you to load another HTML document inside a frame. It wouldn’t appear in the browser’s history, and implied a graphical representation. The OBJECT tag is more versatile, as it can open HTML documents, but also images, movies, and applets. It can also be nested so as to provide fall-back content. For example, if a browser can’t open a movie; it might try to open a Flash applet; and if Flash is not supported, it might open an animated SVG; and if that’s not supported, then it might open an animated GIF, and so on.
MARQUEE is no longer allowed in Strict documents; this tag allowed you to put a lot of content and have it scroll inside a block element. Parameters defined how fast the scrolling went and in what direction. This required HTML to describe actions, which is contrary to the format’s purpose. It can be replaced with CSS and JavaScript.
The
targetparameter inside anchors (A) is no longer valid in XHTML 1.0: instead, you must userel(which defines the relationship of the anchor’shrefdesignated document in relation with the current document). Opening a new window must be made specifically through JavaScript, makingtarget="_blank"illegal. However, due to its heavy use,targetwill remain part of HTML 5.
To generalize, elements that entail an action from the browser or that have more efficient alternatives are being phased out of HTML.
Elements that entail an action from the browser or that have more efficient alternatives are being phased out of HTML
HTML 5, which is in the planning stage, will use HTML 4.01 Strict as a basis. Thus, it is worth your while learning about HTML 4.01 Strict. But this is only one advantage of validation. Next comes proper tag chaining and proper tag nesting.
What’s wrong with badly nested tags? Well, here’s an example:
<b><i>some stuff</b> some other stuff</i>.
The correct form would be:
<i><b>some stuff</b> some other stuff</i>
Nothing too hard! However, you may sometimes want text styled like this:
<b>bold text <i>italicized and bold text</b> italicized text</i>
And what’s wrong here? Well, it is very troublesome.
The DOM is supposed to be a tree. This means that an element can contain several elements, but elements can’t overlap. In the example used above, closing the bold tag should make the italic tag automatically close (and the closing tag </i> is, thus, a coding error), but since italics must be explicitly closed, we have here two coding errors: an unclosed italic tag, and an unexpected italic closure.
How do browsers deal with this wrong mark-up?
Mozilla Firefox/Gecko considers that bold and italics are closed at the same time on a DOM level, but may apply CSS styles until it reaches the italic closure tag. This helps to get the required appearance and creates a valid DOM tree, but makes some styled elements unreachable through the DOM
KHTML/Webkit acts pretty much the same way as Gecko
Opera closes both tags when the bold node is closed, but then opens a ‘phantom’ italic node until it reaches the italic closure. This creates a valid DOM tree and applies styles similarly to Firefox, but creates unnamed nodes
IE makes the tags overlap: italic is considered both a child and a sibling to bold. This makes any recursive operation on the DOM tree circle endlessly
Of all, Opera’s is probably the closest to the ‘proper’ solution’:
<b>bold text <i>italicized and bold text</i> </b><i> italicized text</i>
You could also do:
<b>bold text</b> <b><i>italicized and bold text</i></b> <i> italicized text</i>
for a more semantically correct display.
You could also use CSS for styling (the STYLE tag must be in the document’s header). Remember, I’m using uppercase letters to name tags in my explanations, but writing them in lowercase in actual code. Writing them in lowercase is recommended in HTML 4 and CSS, and you must do so for XHTML:
<style type='text/css'>
.fat {font-weight:bold}
.bent {font-style:italic}
</style>
...
<span class='fat'>bold text </span>
<span class='fat bent'> italicized and bold text</span>
<span class='bent'> italicized text</span>
That’s the cleanest implementation I know (notice that there are only 3 nodes, all on the same tree level, and that the second one uses multiple CSS classes). Instead of spans, you could also use EM and STRONG tags, allowing screen readers to change voice intonation for disabled users.
Obviously, properly nesting tags removes a lot of headaches–and explicitly closing all tags (as required by XHTML) helps to remove a great deal of confusion over tag scopes: there’s no more need to remember what tags must be explicitly closed, as all of them must be.
Write a full post in response to this!
Similar articles
Do you like this post?
Vote for it!
Copyright information
Verbatim copying and distribution of this entire article is permitted in any medium without royalty provided this notice is preserved.Code samples included in the article, due to their simplicity and very nature, can be copied without need for any credits, under the terms of any license you may prefer, such as (but not limited to) GPL, LGPL, MPL, Apache or BSD.
Biography
Mitch Meyran: Mitch is a professional computer abuser. He started on this peculiar kink when he got a 8088-based PC, and soon found out that tinkering with computers is almost as fun as using them - provided the OS lets you do it.
- Login or register to post comments
- 5400 reads
- Printer friendly version (unavailable!)




Looking for Linux hosting, reviews, coupons, etc.? See out user-voted list
Best voted contents
-
Why sharing matters more than marketshare to GNU/Linux
Terry Hancock, 2008-08-01 -
Is Microsoft trying to kill Apache?
Gary Richmond, 2008-08-08 -
The Bizarre Cathedral - 15
Ryan Cartwright, 2008-07-28 -
More evidence of Microsoft "tying up" the Asus EeePC
Tony Mobily, 2008-07-28
Similar entries
Buzz authors
All news
Other sites
- The Top 10 Everything (Dave). The good, the bad and the ugly.
- Free Software news (Dave & Bridget). All about free software -- free as in freedom!
- Book Reviews: Illiterarty (Bridget). Book reviews, blogs, and short stories.
Hot topics - last 60 days
-
Don't compare GNU/Linux with Windows or MacOS - they are not in the same game
Ryan Cartwright, 2008-07-07 -
Self-signed certificates and Firefox 3 - a possible solution
Ryan Cartwright, 2008-08-05 -
Dictators in free and open source software
Tony Mobily, 2008-07-22 -
Why sharing matters more than marketshare to GNU/Linux
Terry Hancock, 2008-08-01 -
Why did Javascript/AJAX mop the floor with Java, Flash and Silverlight? Or, why open standards eventually win
Tony Mobily, 2008-07-30
Hot topics - last 21 days
-
Self-signed certificates and Firefox 3 - a possible solution
Ryan Cartwright, 2008-08-05 -
Why sharing matters more than marketshare to GNU/Linux
Terry Hancock, 2008-08-01 -
Why did Javascript/AJAX mop the floor with Java, Flash and Silverlight? Or, why open standards eventually win
Tony Mobily, 2008-07-30 -
How do Drigg and Pligg compare?
Tony Mobily, 2008-08-17
Dedicated server