HTML: The “M” is for memo
July 2, 2001
As a Web author who cares about standards, few things irritate me more than seeing PDF documents where HTML would be more appropriate.
I am so tired of seeing simple documents that obviously came from Microsoft Word, in PDF format instead of lightweight Web formats. Half the time they don’t even optimise the PDF file properly.
Many of the Word documents that end up in PDF are not complex. They are formatted in Times or Helvetica/Arial, they are linear in structure, they are not heavily designed. Essentially, they are corporate memos and press releases: things that shouldn’t look like printed documents when they are on screen. They contain a few headings, paragraphs and bullet points, maybe a corporate logo and a link or two – nothing that wasn’t in HTML 2.0, let alone HTML 4.0. (Well, there might be a table as well.)
So why do people use Word to produce these documents, and why do they shy away from HTML as the final output format?
Most of the people who create these unnecessary PDFs are not computer buffs. Computers are a tool they have to use every day in their corporate jobs. They’ve mastered Word or PowerPoint, and they use e-mail a lot. But this Web thing is still new to them (I wish I had a buck for every colleague to whom I’ve had to explain that you can single-click on a Web page link, not double-click). They have no desire or time to learn markup or use text editors. Their employers won’t buy them a GUI Web authoring tool like Dreamweaver, and they don’t know about the free HTML authoring tools that exist – which could be a lot better in any case. So they use Word.
If your document is already in Microsoft Word format, it might be tempting to use PDF for its online delivery. Word’s “Export to HTML” function is pretty much a joke. Microsoft Word 97 and 98 can’t even generate HTML tags that nest properly, and they don’t recognise the standard “Heading 1” and “Heading 2” Word styles as things that should be marked up with H1 and H2 tags. Word 2000 and its Office counterparts spew out so much weird XML-inspired crud, even using the Compact HTML option, that you would think Microsoft is trying to give XML a bad name. These problems can be fixed pretty easily with some cleanup tool like HTML Tidy or the Dreamweaver “Clean up HTML” command. But the people creating these documents have never heard of the W3C, let alone its Tidy tool, and they don’t have Dreamweaver. So instead they export to PDF.
Similarly, a lot of people worry that their Web documents don’t look “the same” as the print documents they base their expectations on. Well, of course it looks different – it’s on a screen: low resolution; emitted light, not reflected light; and landscape, not portrait. Nonetheless, they save their documents as PDF for online delivery, out of fear of losing that pixel-perfect concordance with what came out of their printer. Considering so many Word documents are ill-designed messes full of haphazard formatting, I have to question whether this is a sensible goal, but there you go. Big fat PDFs, badly designed and laid out, not optimised, slow to load and completely unnecessary.
CSS to the rescue
If the corporate IT department or a tech-savvy person in the business end of the organisation knew CSS, a lot of standard business communications could be streamlined and trimmed down into native HTML format for both editing and distribution. Oh, I know it would be even better to have some fancy-schmancy XML solution with FROM and TO tags defined in the Document Type Definition (DTD) and all that rot.
But HTML and CSS will generate attractive, efficient corporate documents right now. In fact, you can even replicate the look of your printed documents using CSS2, while having an on-screen style that’s better suited to on-screen reading.
That’s what the
@media constructs in CSS are for. Set up a standard
stylesheet for your memos, with an attractive layout, sans serif font and
colored backgrounds, and add an
@media print section that respecifies
everything as your corporate standard font (don’t tell me –
Times! Did I guess right?) and black text on a white background. Since
this is a stylesheet for print, you can even use point-size specifications,
which as we
should all know by now are fraught with problems for on-screen use.
Eric Meyer, author of the O’Reilly book Cascading Style Sheets:
The Definitive Guide has a good tutorial on this at WebReview.
Alternate stylesheets are part of the CSS specification, but only the
newest browsers support them yet. The
@media construct is
somewhat better supported; how you do this depends on whether you are
developing for intranets or public sites, and what kinds of browsers your
Alternate stylesheets for print are also useful for those “printer-friendly” versions of pages. Instead of having two versions of a page, or some sort of ASP or Perl business to respecify the look, you can simply specify that all those navigation elements and ad banners disappear when the user prints it. They don’t have to click on a link and go through a new page load just to print out the article; it just works. AListApart has done this in its recent redesign, and so can you.
The missing toolkit
The problem with this proposal is that your average conservative corporation is not going to shell out on a new tool for generating standard corporate memos when it already has a site licence for Word. So we need a tool that is free, works like a fairly normal Mac or Windows application, and generates valid markup and CSS instead of font tags and improperly nested debacles.
We need something we can smuggle in under the IT department’s noses. I am yet to find such a product. For FrontPage Express, read, “Another crappy MS tool that spits out bloated proprietary markup.”
I had high hopes for the W3C’s own Amaya, but its Windows version is quirky and it has no Mac version at all.
Perhaps the most promising candidate is good old Netscape Composer. But it doesn’t do CSS very well, generates FONT tag-heavy markup, doesn’t follow standards in the widely used 4.x version and isn’t really designed for a world of document templates. Moreover, since many corporate IT departments are Microsoft shops, a free editing tool that happens to involve the “other” browser is likely to be verboten. Maybe once the iCab guy implements CSS for its final version, he can turn his attention to an authoring package. The Opera people might be good candidates for this task, too, except that they are unlikely to send it out for free. If you know of such a product, please drop me a line.
So it’s a shame that Microsoft has flubbed the opportunity to create a lean, standards-compliant tool for creating basic documents with HTML/XHTML as its native format. Yes, I know that it says that the future is XML, but we’ve seen its current efforts, and the confused state of the .NET initiative suggests that they haven’t bought a clue on this yet.
PDF has a role
That said, I don’t want to dismiss PDF as a format entirely. My problem is with its use in situations where HTML would be more appropriate; in other contexts, PDF is a boon. Lately, I’ve been exploring some of PDF’s more advanced features. I’ll discuss this in more detail in a future column.