In praise of PDF
September 24, 2001
Want to dig even deeper? Post to the new MacEdition Forums (beta)!
Look, I know I ranted about inappropriate use of PDF earlier. That doesn’t mean I disapprove of PDF in general. In fact, when used appropriately, I love PDF.
HTML and XML are great for certain kinds of content. But what if it’s a 50-page report and you expect people to print it off to read it anyway? What if it’s got lots of mathematical content? Mathematical Markup Language might be at Version 2.0 as a W3C recommendation, but almost nothing actually supports it, and you can’t rely on it for content that might be viewed by the general (academic) public. LaTeX and Mathematica both have HTML export capability, but the results can be haphazard. Besides, who wants 4000 GIFs to represent those equations? So academics use PostScript or PDF. Companies use PDF for research reports and brochures for similar reasons.
I know that certain people will claim that using PDF will make your Web site 300 percent less usable, and probably reduce the bounce and shine of your hair by 47 percent as well. That’s because they were comparing inappropriate uses of PDF with appropriate uses of HTML. I could do a study using something from, say, a mathematics journal or a series of working papers in economics or statistics. I could present it in PDF and some crappy HTML output and see which one was easier for people in the field to use. (What’s “300 percent less usable” supposed to mean anyway? That 60 percent of people reading an HTML page understood its content, but only 20 percent of a similar group of people reading the same content in PDF format? It would be nice if the gurus would say what their quantified abstract nouns mean, wouldn’t it?)
To be fair, Nielsen’s guidelines on PDF usage are pretty sensible. In a similar vein, the rule of thumb I use is this: If you can imagine it in HTML and the thought of reading the whole document on screen doesn’t make you scream, use HTML. If it’s more than ten printed pages at 11-12pt single spaced, or it’s intellectually difficult material intended for careful reading by a specialist audience, use PDF.
For those of you who do need to mark up mathematical notation in Web pages,
consider what the existing HTML entities can do for you. My testing shows
that most browsers on the market today can render Greek letters and an
array of mathematical symbols using standard HTML entities. Some browsers
(like IE4.5) only manage with decimal-coded entities, like
β instead of
β. As usual,
Netscape 4 is the drag: it only accepts decimal-coded entities for things
like curly quotes and doesn’t recognise these mathematical symbols or
Greek letters at all. Still, it would be nice to be able to put some
scholarly articles on the Web in lightweight HTML, with each symbol taking
up only five or six bytes, instead of the few hundred bytes it takes to
store a small GIF displaying the symbol. You could use the same GIF for
100 instances of the same symbol, but the markup to show it will still take
more characters than an HTML entity, and you have to worry about the GIF
being a sensible size relative to the viewer’s default font size.
Sophistication no one sees
There are a few good books on this, particularly Web Publishing with Acrobat/PDF by Thomas Merz, and The LaTeX Web Companion by Goossens and his colleagues. I can’t claim to be an expert on any of this, although I have done a few PowerPoint-free presentations. It’s something I’d like to investigate more – and could, if hammering down browser bugs didn’t occupy so much of my time.
Through the Quartz prism
One of the reasons users love OmniWeb is that it leverages the beautiful text antialiasing algorithms in Mac OS X so that pages look really good. The display technology behind Mac OS X is essentially PDF, and PDF is a native format on that OS. So professional Mac users have the march on the competition to leverage PDF’s capabilities. I’ll be watching with interest to see what turns up, and whether Mac OS X users will soon be able to easily produce documents that anyone can read. Already there’s TeXShop, a TeX/LaTeX front end that uses PDFLaTeX natively, to generate sophisticated documents in PDF. Unfortunately, you need to be able to install the Unix TeX distribution, teTeX, which requires an advanced physics degree in itself. Fortunately, the instructions that come with TeXShop are clearer than your average documentation for a LaTeX package or distribution. I’d be interested in hearing about other OS X applications that make the most of the Quartz underpinnings. Drop me a line.
Still, as we are finding out, there are differences between what Apple has implemented from the public specification for PDF, and what Adobe does in its own products like Acrobat and Acrobat Distiller. It remains to be seen what this implies for functionality of Mac OS X.
Why can’t the Web be like PDF?
In fact, I’d like to see a browser from Adobe. They’re smart enough not to take on Microsoft on the turf for which it risked antitrust action, so I don’t think it will ever happen. I also think they are more concerned with “dee-ziner” flourishes than standards compliance or trim code, if GoLive is any indication.
But for everyone who’s sick of 10-point Verdana, for everyone who would like their browser to kern and hyphenate text like a professional typographer can, an Adobe browser would be be a joy. When those 200-300 dpi screens become commercially viable, this will be something we beg for.
Of course, another reason why this would never happen is that Acrobat Reader can rely on PDF files being syntactically correct. Browsers cannot, because of the sloppy practices fostered by bad tools and bad attitude.
PDF, like HTML, is widely accessible. There are readers for almost every platform, including handhelds. Visually impaired Windows users can have the content read to them. And the readers are free; only the creation software tends to cost, unless you’re good at TeX/LaTeX.
Wouldn’t it be ironic if the cross-platform promise of the Web ended up being delivered by PDF, the work of one company?