Compart - Document- and Output-Management


Smart Digital Documents are Intelligent Documents

Compart |

The "Educated Paper" or Smart Documents

One phenomenon still being observed in many companies is that digital documents that could actually be read and processed by machine are first being printed and then re-digitalized as TIF or JPG files. Pixels are created from content. In other words, the actual content is initially encrypted (raster images) and then rendered "readable" through optical character recognition (OCR). It is a cumbersome and superfluous procedure, one in which important information for downstream processing gets lost.

A better approach would be to create smart digital documents right from the start – documents that can do far more than be printed. The topic of multi-channel capability warrants a closer look not only because delivery methods have become much more diverse over the last few years. Important structural data is still being lost on the way to output, regardless of channel, which is simply no longer in keeping with the times. "Intelligent documents" with interactive elements and data that can be passed to downstream applications are in demand. Examples of such data include instructions to printing and insertion systems or which attachments and how many of them should be added to a mailing.


Reading time: 5 min

  • Interactive and accessible content
  • Smart documents = multichannel documents
  • Why HTML5?

Smart Digital Documents: Interactive and Accessible Content

Gone are the days in which documents are just printed or, in the best of cases, sent electronically. Today documents are assuming the role of an "information container" that allows users to launch different actions that go well beyond generation and sending: processes such as data research, transactions, or storing archiving rules and control codes for cost-optimized mailing (combining multiple documents to the same recipient and better utilization of bulk mailing rates.

Smart Documents Digital Processing

How to Create Accessible Documents
Read more about Digital Accessibility, Omnichannel Capable Documents and Future-proof Solutions

Different metadata, multimedia information, hyperlinks, business graphics and other data are added to documents originally destined purely for print, enabling them not only to be output through any given channel but also to be linked with other processes. Smart documents also support universal accessibility. Everybody, regardless of disability, must now have access to the full scope of the content, whether by means of an audio file or PDF files, which through automatic reflow and font size alteration make content accessible to the visually impaired.

In Germany, for instance, all authorities and public facilities must implement universal accessibility. Standard forms and brochures need to be prepared so that a screen reader can correctly present the content in understandable speech. This also involves document enrichment. Information required for correct reproduction and output, regardless of channel, are saved in the form of metadata (also known as tagging). This data mainly comprises specifications on text structure such as reading direction, spoken language, column sequence, hyphenation, cross references, and references to footnotes, etc. Tagging such keywords is essential for this purpose.

Multichannel Documents Are Universally Accessible Intelligent Documents

In this context, the PDF/UA (Universal Accessibility) format plays an important role. The official certification of ISO Standard 14289-1 in summer 2012 considerably simplifies the creation of generally accessible, and hence intelligent documents.

Even better, the format builds on HTML5. The text-based markup language is already setting the tone on mobile platforms. And it’s no wonder: HTML5 content can be easily processed for any electronic output channel, be it a smartphone or a Web site. And the document can still be printed if so desired. Conversion to PDF files of any page size is also possible.

Intelligent Documents Multichannel Accessibility (PDF-UA / HTML5)

HTML5 is currently the most intelligent format for the creation and display of smart digital documents, regardless of size or output channel. It allows reformatting, e.g., from A4 to smartphone display, or conversion from page formats to text-oriented formats. Individual data can be extracted (including retrieval of invoice items) and tables of contents and index lists can be built. And there is more. With HTML5, even audiovisual elements, Web links and charts can be embedded. This creates not only multi-channel-capable documents, but multimedia documents that offer users added value beyond just display of text.


Starting Small

It is high time that companies begin to address this issue. The projects do not necessarily need to be major – document enrichment can start quite small. A well-known regional energy supplier in southern Germany, for example, is working with keywords in accounting to make researching invoices in the archive much quicker. Specifically, all invoices for electricity, gas and water are first generated as AFP files (AFP is the common format for high-volume production printing) and converted to PDF/A format for archiving. From the AFP files, special applications grab key information such as customer name, type of delivery, invoice number and customer number, embedding them as index values in the archive file (PDF).

Often it is the celebrated small steps that lead to success. Taking the first step is absolutely essential: whether it is adding an electronic signature field, embedding audiovisual content or storing additional functions and rules for downstream processes. Enriched documents can become data carriers or smart digital documents, accessible to everybody.