Module 4: Adobe PDF

Module 4 Adobe PDF

Adobe's Portable Document Format (PDF) is by far the most common format for allowing users to view documents without having to have the original software in which the documents were created. The Acrobat Reader software is free and comes built in to most browsers.

PDF documents may range from fully accessible to completely inaccessible, depending largely on how the document was created.

Tagged PDFs

PDF documents contain semantic elements called tags; the hierarchical structure of tags in a PDF document is referred to as the tag tree. Editing the tag tree requires Adobe Acrobat Pro, and is best left to professionals (and even they don't like to do it.) It's very difficult to edit the tag tree without damaging the file, so changes are always best made in the original document.

To create a tagged PDF, start with a Word, PowerPoint, or other file that already has the appropriate semantic information (e.g., headers, lists, tables, etc.) included in it.

  1. Click File > Save As, and select the desired location.
  2. Select PDF from the drop-down menu.
  3. Select Best for electronic distribution and accessibility.
  4. Click Export.

Important note: Make sure to use the Save As option rather than selecting PDF as an option from the Print menu. "Printing" a file as a PDF will strip out all the tags.

Scanned PDFs

Unless it has been processed by an Optical Character Reader (OCR), a scanned PDF document is basically just one big picture. A screen reader will interpret it as an image, and so not provide any usable information to the user.

Where possible, it's always better to have a PDF that was created electronically using the original document. In the case of older documents--books and journal articles, for example--it's important to obtain the highest quality scan possible. OCR processing is only as good as the image that is fed into it, so scans that are low-resolution, crooked, or "dirty" will not return a good result.

There are free OCR services Links to an external site. available on the web. They aren't as good as high-end book scanners, but they do a pretty good job if given high-quality input.

For instance, if we enter this pragraph from a Jane Austen novel:

Scanned image of a page from Pride and Prejudice. Text follows.

 

into the Free OCR site, it yields the following result:

 “Come. Darcy," said he, "I must have you dance. I hate to see you standing about by yourself in this stupid manner. You had much better dance.”
“I certainly shall not. You know how I detest it, unless I am particularly acquainted with my partner. At such an assemth as this. it would be insupportable. Your sisters are engaged, and there is not another woman in the room whom it would not be a punishment to me to stand up with.”
‘I would not be so fastidious as you are." cried Bingley. “for a kingdom! Upon my honour, I never met with so many pleas- ant girls in my life as I have this evening; and there are several of them. you see. uncommonly pretty.”