PDF/UA: Push for More Barrier-free Documents
By no means is this a new issue. The blind, for example, have a number of services, such as screen readers and Braille-capable printers, to give access to printed and electronic information. The problem is that often the documents are missing vital structural information, such as reading direction, language, column sequence, syllabication instructions, etc., that is needed for accurate consumption of the message. How do we clarify homonyms when the output is speech, e.g. "I'm afraid not" vs. "I'm a frayed knot." Or what if superfluous information such as headers, footers, page numbers or logo names were read aloud?
For documents to be truly universally accessible, they need to fulfill a number of criteria. Tagging is key. What text passages and blocks belong together? In what order and how much of the text should be read aloud? Furthermore, non-text objects need alternative text, changes to the original (artifacts) need to be identified and the text formatted in Unicode. The new PDF/UA (Universal Accessibility) format, whose certification as an ISO standard is merely a matter of time, will surely substantially ease the creation of universally accessible documents.
Of course, tagging PDF files is already available, but structurally it does not go deep enough. To really create barrier-free information, you need to "dive deep" into the structure of a document. Conventional PDF tools cannot do that – yet. At least by the time accessible PDF (PDF/UA) becomes a standard, there will also be software for generating such documents. Experts view "Barrier-free IT" as a paramount issue for the years ahead.
Barrier-free Concerns Everyone
Some companies are still holding back, probably due to the fact that the issue has been watered down to the single aspect of equal rights for the disabled. But the demand for universally accessible information is not restricted to the needs of persons with disabilities. Consider the growing number of elderly in society. That alone warrants preparing documents for the natural changes in seniors' sensory capacities. This is certainly one reason why government agencies, and increasingly companies, offer their Web content in various font sizes. Imagine e-mails being read aloud in the car by a computer voice. That, too, is universal accessibility, requiring the recording of structural information and metadata in the document.
But there is yet another reason to concentrate on the issue. With the burgeoning availability of transactional documents on Web portals, semantic quality plays an increasingly important role, regardless of any legal obligations. Document workflows need to break out of the Letter-sized mold, for example, and prepare their content for other forms of output in the future. Think mobile end terminals. They involve incremental reassessment of documents originally intended for print only, making them multi-channel capable, able to incorporate as much data as possible prior to final output, e.g. for archiving. This is where the necessary index data is embedded in the data stream.
Accessible Documents: Stop Destroying Information!
While we often see data being tossed out on the way to output, no matter via what channel, it's a practice no longer in keeping with the times. Often digital documents that could be read and processed via machine are first converted into analog form, i.e. print, and then into TIF or JPG documents, creating "pixel clouds" from the content. The actual content is initially encrypted (raster images) and then rendered "readable" through optical character recognition (OCR). Not only is this cumbersome, but also involves the loss of metadata needed for further processing.
On the other hand, accessible PDF documents can be reformatted, e.g., from Letter-sized to smartphone display, or converted into other formats (page format back to text-oriented format). Individual data can be extracted (including retrieval of invoice items) and table of contents and index lists can be built.
In the final analysis, universal accessibility is hardly an issue just for the disabled. In document creation, processing and output, in represents a quantum leap. It is far more than mere political correctness.