Codebitch : Click to return to MacEdition homepage
 

In praise of PDF

September 24, 2001

Feedback Farm

Have something to say about this article? Let us know below and your post might be the Post of the Month! Please read our Official Rules and Sponsor List.

Forums

Want to dig even deeper? Post to the new MacEdition Forums (beta)!

Look, I know I ranted about inappropriate use of PDF earlier. That doesn’t mean I disapprove of PDF in general. In fact, when used appropriately, I love PDF.

HTML and XML are great for certain kinds of content. But what if it’s a 50-page report and you expect people to print it off to read it anyway? What if it’s got lots of mathematical content? Mathematical Markup Language might be at Version 2.0 as a W3C recommendation, but almost nothing actually supports it, and you can’t rely on it for content that might be viewed by the general (academic) public. LaTeX and Mathematica both have HTML export capability, but the results can be haphazard. Besides, who wants 4000 GIFs to represent those equations? So academics use PostScript or PDF. Companies use PDF for research reports and brochures for similar reasons.

PDF has a reputation for being bloated, and rightly so, given the kinds of PDF documents you see on the Web. It needn’t be true in all circumstances, though. PDFs can be quite trim, even if there are embedded fonts in them. If you are prepared to stick to the standard fonts (Times, Helvetica, Courier), they can be even trimmer. A ten-page Word document might be 80KB, but if you don’t embed the fonts, the PDF version will be more like 40KB. Given the plethora of pages weighing in at more than 100KB with all the ad banners and JavaScript and quadruply nested tables with 200 spacer GIFs, that doesn’t seem so bad.

I know that certain people will claim that using PDF will make your Web site 300 percent less usable, and probably reduce the bounce and shine of your hair by 47 percent as well. That’s because they were comparing inappropriate uses of PDF with appropriate uses of HTML. I could do a study using something from, say, a mathematics journal or a series of working papers in economics or statistics. I could present it in PDF and some crappy HTML output and see which one was easier for people in the field to use. (What’s “300 percent less usable” supposed to mean anyway? That 60 percent of people reading an HTML page understood its content, but only 20 percent of a similar group of people reading the same content in PDF format? It would be nice if the gurus would say what their quantified abstract nouns mean, wouldn’t it?)

To be fair, Nielsen’s guidelines on PDF usage are pretty sensible. In a similar vein, the rule of thumb I use is this: If you can imagine it in HTML and the thought of reading the whole document on screen doesn’t make you scream, use HTML. If it’s more than ten printed pages at 11-12pt single spaced, or it’s intellectually difficult material intended for careful reading by a specialist audience, use PDF.

For those of you who do need to mark up mathematical notation in Web pages, consider what the existing HTML entities can do for you. My testing shows that most browsers on the market today can render Greek letters and an array of mathematical symbols using standard HTML entities. Some browsers (like IE4.5) only manage with decimal-coded entities, like β instead of β. As usual, Netscape 4 is the drag: it only accepts decimal-coded entities for things like curly quotes and doesn’t recognise these mathematical symbols or Greek letters at all. Still, it would be nice to be able to put some scholarly articles on the Web in lightweight HTML, with each symbol taking up only five or six bytes, instead of the few hundred bytes it takes to store a small GIF displaying the symbol. You could use the same GIF for 100 instances of the same symbol, but the markup to show it will still take more characters than an HTML entity, and you have to worry about the GIF being a sensible size relative to the viewer’s default font size.

Sophistication no one sees

Of course, PDF isn’t just about printing stuff out, but few people seem to realise this. It’s capable of many advanced functions for on-screen use, including automated side navigation panels, fill-in forms and JavaScript. LaTeX users can ditch Microsoft PowerPoint in favor of free packages like Pdfscreen and do their presentations in PDF. There is significant hypertext functionality in PDF, so you can easily add hyperlinks to footnotes, table of contents items and other documents. Somebody even wrote a calculator in PDF and JavaScript, although you have to wonder why.

There are a few good books on this, particularly Web Publishing with Acrobat/PDF by Thomas Merz, and The LaTeX Web Companion by Goossens and his colleagues. I can’t claim to be an expert on any of this, although I have done a few PowerPoint-free presentations. It’s something I’d like to investigate more – and could, if hammering down browser bugs didn’t occupy so much of my time.

Through the Quartz prism

One of the reasons users love OmniWeb is that it leverages the beautiful text antialiasing algorithms in Mac OS X so that pages look really good. The display technology behind Mac OS X is essentially PDF, and PDF is a native format on that OS. So professional Mac users have the march on the competition to leverage PDF’s capabilities. I’ll be watching with interest to see what turns up, and whether Mac OS X users will soon be able to easily produce documents that anyone can read. Already there’s TeXShop, a TeX/LaTeX front end that uses PDFLaTeX natively, to generate sophisticated documents in PDF. Unfortunately, you need to be able to install the Unix TeX distribution, teTeX, which requires an advanced physics degree in itself. Fortunately, the instructions that come with TeXShop are clearer than your average documentation for a LaTeX package or distribution. I’d be interested in hearing about other OS X applications that make the most of the Quartz underpinnings. Drop me a line.

Still, as we are finding out, there are differences between what Apple has implemented from the public specification for PDF, and what Adobe does in its own products like Acrobat and Acrobat Distiller. It remains to be seen what this implies for functionality of Mac OS X.

Why can’t the Web be like PDF?

Browser manufacturers could learn a lot from Adobe. Acrobat Reader is trimmer than most browsers, but faithfully renders documents of much greater complexity. It’s more stable. It handles typography in much more sophisticated ways. PDF also has interaction capabilities using JavaScript in the exact same way that Web pages have, and PDF forms are arguably more manageable than clunky old HTML forms.

In fact, I’d like to see a browser from Adobe. They’re smart enough not to take on Microsoft on the turf for which it risked antitrust action, so I don’t think it will ever happen. I also think they are more concerned with “dee-ziner” flourishes than standards compliance or trim code, if GoLive is any indication.

But for everyone who’s sick of 10-point Verdana, for everyone who would like their browser to kern and hyphenate text like a professional typographer can, an Adobe browser would be be a joy. When those 200-300 dpi screens become commercially viable, this will be something we beg for.

Of course, another reason why this would never happen is that Acrobat Reader can rely on PDF files being syntactically correct. Browsers cannot, because of the sloppy practices fostered by bad tools and bad attitude.

PDF, like HTML, is widely accessible. There are readers for almost every platform, including handhelds. Visually impaired Windows users can have the content read to them. And the readers are free; only the creation software tends to cost, unless you’re good at TeX/LaTeX.

Wouldn’t it be ironic if the cross-platform promise of the Web ended up being delivered by PDF, the work of one company?

— CodeBitch (codebitch@macedition.com) is the grumpy cow who does the HTML production for MacEdition. Read other articles by CodeBitch

E-mail this story to a friend

Talkback on this story!

Cannot connect to the database.
Please contact the administrator.