Compart - Document- and Output-Management

Intelligent Documents: The well-informed Paper

"Intelligent documents" are digital documents that can do more than simply be printed and sent. They contain information, codes and data for processing on all physical and electronic channels, going far beyond pure output.

One phenomenon still being observed in many companies is that electronic documents that could actually be read and processed by machine are first being printed and then re-digitalized as TIF or JPG files. Pixels are created from content. In other words, the actual content is initially encrypted (raster images) and then rendered "readable" through optical character recognition (OCR). It is a cumbersome and superfluous procedure, one in which important information for downstream processing gets lost.

A better approach would be to create digital documents right from the start – documents that can do far more than be printed. The topic of multi-channel capability warrants a closer look not only because delivery methods have become much more diverse over the last few years. Important structural data is still being lost on the way to output, regardless of channel, which is simply no longer in keeping with the times.

"Intelligent documents" with interactive elements and data that can be passed to downstream applications are in demand. Examples of such data include instructions to printing and insertion systems or which attachments and how many of them should be added to a mailing.

Interactive content with unrestricted access

Gone are the days in which documents are just printed or, in the best of cases, sent electronically. Today documents are assuming the role of an "information container" that allows users to launch different actions that go well beyond generation and sending: processes such as data research, transactions, or storing archiving rules and control codes for cost-optimized mailing (combining multiple documents to the same recipient and better utilization of bulk mailing rates.

Different metadata, multimedia information, hyperlinks, business graphics and other data are added to documents originally destined purely for print, enabling them not only to be output through any given channel but also to be linked with other processes. "Intelligent documents" also support universal accessibility. Everybodyl, regardless of disability, must now have access to the full scope of the content, whether by means of an audio file or PDF files, which through automatic reflow and font size alteration make content accessible to the visually impaired.

In Germany, for instance, all authorities and public facilities must implement universal accessibility. Standard forms and brochures need to be prepared so that a screen reader can correctly present the content in understandable speech. This also involves document enrichment. Information required for correct reproduction and output, regardless of channel, are saved in the form of metadata (also known as tagging). This data mainly comprises specifications on text structure such as reading direction, spoken language, column sequence, hyphenation, cross references, and references to footnotes, etc. Tagging such keywords is essential for this purpose.

Multi-channel-capable documents are also intelligent documents

In this context, the PDF/UA (Universal Accessibility) format plays an important role. The official certification of ISO Standard 14289-1 in summer 2012 considerably simplifies the creation of generally accessible, and hence intelligent, documents. Even better, the format builds on HTML5. The text-based markup language is already setting the tone on mobile platforms. And it’s no wonder: HTML5 content can be easily processed for any electronic output channel, be it a smartphone or a Web site. And the document can still be printed if so desired. Conversion to PDF files of any page size is also possible.

HTML5 is currently the most intelligent format for the creation and display of documents, regardless of size or output channel. It allows reformatting, e.g., from A4 to smartphone display, or conversion from page formats to text-oriented formats. Individual data can be extracted (including retrieval of invoice items) and tables of contents and index lists can be built. And there is more. With HTML5, even audiovisual elements, Web links and charts can be embedded. This creates not only multi-channel-capable documents, but multimedia documents that offer users added value beyond just display of text.

Start small

It is high time that companies begin to address this issue. The projects do not necessarily need to be major – document enrichment can start quite small. A well-known regional energy supplier in southern Germany, for example, is working with keywords in accounting to make researching invoices in the archive much quicker. Specifically, all invoices for electricity, gas and water are first generated as AFP files (AFP is the common format for high-volume production printing) and converted to PDF/A format for archiving. From the AFP files, special applications grab key information such as customer name, type of delivery, invoice number and customer number, embedding them as index values in the archive file (PDF).

Often it is the celebrated small steps that lead to success. Taking the first step is absolutely essential: whether it is adding an electronic signature field, embedding audiovisual content or storing additional functions and rules for downstream processes. Enriched documents can become data carriers.

Get the Answers
and Solutions You Need