Compart - Document- and Output-Management

Development and Technology

Localizing Content: More Than Just Translation

Jeremias Märki |

Multiple Languages in Document Processing

In countries like India, Canada and Switzerland, it’s a must. But for companies that want to offer more than the bare minimum in customer communication, there’s also no way around it. For more and more people in today’s multicultural society, the primary language of the country where they reside is not necessarily their native tongue.  Why not use this to your competitive advantage and modify content accordingly?

To begin with, document and output management is all about content. Depending on the output channel, the content must be supplied in the language the recipient desires. Merely translating the text would still miss the mark, which is why in this context we talk about localization.

Level 0 – No Localization

To better understand the relevance and complexity of this topic, think of localization as an intellectual exercise with different levels of difficulty, starting with level 0: The first question is whether localization is required at all in a particular case. Although the United States does not have an official language, the most commonly used language is English and there is a significant need for localization. A report from the Migration Policy Institute found that 22% of the U.S. population does not speak English at home. That figure represents more than 72 million people. In fact, it is estimated that there are more than 300 languages spoken in the United States. The U.S. Federal Government spends more than $4.3 Billion on Outsourced Translation and Interpreting Services every year.

Level 1 – Multiple Languages for Domestic Recipients Only

Once you have decided to localize, you need to address further questions, like whether to produce multilingual documents for local recipients only. In this case, you do not need to take country-specific considerations into account. But even then, translation is not enough. You may need to modify images or graphics in addition to text. In particular, personalized business documents contain variables in the text that require decisions on a case-by-case basis, which depending on the language, may need to be structured very differently. Take the following formulations for three cases (0, 1 and n):  “...has no other policies,” “has one other policy,” “or has three other policies.” You can't simply substitute a placeholder like “has {number} policies.”

In cases like this, business logic needs to be separated from linguistic logic, and considered and implemented individually. Take the example of car insurance.  Lots of insurers offer their customers a discount if they park their car in a garage at home. On the business side, complying with this request triggers a discount. On the linguistic side, output management is supplied raw data that contains an indicator based on this rule and later generates an appropriate text block (e.g., identified by a unique name) in the document. The actual text is irrelevant at this point.

Later during formatting (composition), the proper version of the text block is selected and the placeholder substituted with the linguistically accurate text. The linguistic logic must therefore be converted multiple times, once for each language, and perhaps even per output format (print/PDF, HTML, etc.).

Besides the complex replacement of variables, localization must address other elements as well. A date value from an XML file, e.g., “2015-10-16” (ISO 8601 or XML schema format) has to be reformatted to “16 October 2015” or “10/16/2015.” Figures, too, must be appropriately prepared. The layout also needs to take into account that some languages generate longer text that can change the line and page breaks. German, for example, tends to be less concise than English. Sorting is yet another aspect. In Germany, DIN 5007 stipulates that “ä” and “a” be considered the same in dictionaries, but not in telephone books. But “ä,” "ae" and "æ" are treated the same in phone books.

From an overall systems standpoint, uniform application of the Unicode standard is called for in order to prepare for additional languages in the future. A Western European 8-bit character set (e.g., ISO 8859-1) will not suffice in the long run. All the elements in the system landscape need to support multilingualism: applications, output management, and exchange formats. The entire system needs to be internationalized.


Level 2 – Beyond the Border


It gets even more complicated when a company sends documents outside the country. Here, output management isn’t the sole player. The design of a company’s Internet presence for an international public must also be considered. Suddenly numbers and dates have to be formatted differently for each country. The amount “7654,12 €” in Germany becomes “7.654,12 €” in Austria and “Fr. 7654.12” in Switzerland. Note the different thousands and decimal separators as well as the position of the currency symbol. Telephone numbers are also depicted differently depending on the country.

Words are yet another example. In Switzerland, the “ß” character is not used,  so “Straße” (street) becomes “Strasse.”

Even the choice of word can differ: Germany, Austria and Switzerland all use a different word for cream. This leads us from language specificity to country specificity, as seen in codes like “de_DE” for Germany and “de_AT” for Austria in technological applications. You can take this even further by appending properties such as region.
Selecting data structures for exchange formats like XML can also be somewhat tricky; take address formatting, for example. The zip code can come after the city (USA) or below the city (England). The following graphic clearly illustrates the situation.

Compart North America Inc.
700 Commerce Drive, Suite 500
Oak Brook, IL 60523
United States

Compart AG
88 Wood Street
United Kingdom

Compart AG
Otto-Lilienthal-Str. 38
71034 Böblingen

Other differences must also be addressed. Units of measure can differ: °C versus °F, centimetres versus inches, meters versus feet. Depending on the target country, a different paper size is used for printing (DIN A4 versus US letter). Time zones and daylight savings time may also need consideration.

Level 3 – Target Groups & Culture

Each country has different legal requirements that can affect documents issued. Special taxation rules (i.e., VAT) may need to be considered. Price labeling specifications may also differ. These are just a few of the topics related to compliance.

Another issue is how to address specific target groups. Culture, religion, aesthetics, clothing, the meaning of color, and titles in salutations all play a role in localization. One familiar example is the photograph of a friendly customer-service representative. She is wearing a headset and smiling at the camera. In the German-speaking world, she has Caucasian facial features, modern European attire, and no head covering. All well and good in Europe, but is this appropriate in countries in Africa or Asia? Hardly.

We have already discussed the relevance of word choice. It gets even more complicated when the language shifts from the standard left-to-right direction of the Latin languages. Hebrew and Arabic are written from right to left and then top to bottom (RL-TB) and traditional Asian scripts from top to bottom and then right to left (TB-RL). Spelling and hyphenation are quite different.

Calendars differ as well. Holidays are regional, and Monday is not the first day of the week everywhere. Even our standard Gregorian calendar is not used throughout the world. In Thailand, for instance, the Buddhist calendar is used.

Localization involves many aspects, which represent challenges to a greater or lesser degree. Each situation must be analyzed and addressed individually.

Content versus Presentation

In addition to separating business logic from linguistic logic, internationalization requires differentiating content from presentation. Issues that need consideration include:

  • How do I represent information in exchange formats that are largely language-independent?
  • How do I present this information to the re-cipient?
  • Does the address need to be saved in a structured format or is it already available in the best format for conversion? Do both formats need to be maintained?
  • Which aspects are internationalized and where? In the technical application, in output management, or someplace else? Can some things be handled centrally for all output channels so that they don’t need individual conversion for different technologies?

Here, the strict separation of content from presentation using appropriate technologies (such as XML) helps enormously. Complexity can be reduced  by dividing individual tasks into sequential process steps. The different databases maintained by the Unicode Consortium are also useful, particularly the Common Locale Data Repository (CLDR ).

Turning the Compulsory into the Voluntary

In sum, localization is more than just transla-tion. It takes many factors into account, and not all aspects need always be addressed. This article is simply intended to provide some food for thought. A company might earn substantial points from its customers by offering communications in more than just the required language(s). Localization can be approached not only as a necessity; it can also be voluntary.

Localization is driven not only by increasing legal compliance but also the needs of marketing departments. Of course, a cost-benefit analysis is needed to determine just how far to go. The impact on the IT architecture can be quite extensive, so it is essential to address the issue of localization in document and output management before investing in new systems.