Why you should escape your ampersands

Molly's recent Searching for Standards, mentions that AOL's search engine doesn't validate due to unescaped ampersands. I've personally seen a lot of this happening and I thought I'd relate a little experience I had a few years ago while working for my previous employer where I learned the hard way that I should escape my ampersands.

The project called for a set of parameters to be passed through the querystring, one of which was named "sect" which was short for the word "section." The final URI looked something like this:

<a href="http://www.domain.com/page.asp?param=1&sect=2">My Link</a>

Much to my surprise, the link didn't work. Instead it would render something like this:

<a href="http://www.domain.com/page.asp?param=1§=2">My Link</a>

A little investigation and I discovered that &sect (or more properly &sect;) was actually the character entity for the section sign. So when the browser came across my anchor it took the contents of the href attribute and did what it was supposed to do. It parsed its contents, found character entities and rendered them. Of course, by rendering the &sect portion of my querystring, it broke my anchor.

Now, that's one real world example of how not escaping your ampersands can cause you trouble. Though you may be thinking "yeah, but what if I don't use the word sect in my querystrings?" Well consider this. There are many more character entities that can pose a danger to your code. What if you were passing currency information using any of the following entities: &cent;, &yen; or &pound;? Or what if you wanted to use something like: &not;?

The bottom line is, if you don't escape your ampersands, you could end up with broken code.