authors are vetted experts in their fields and write on topics in which they have demonstrated experience. All of our content is peer reviewed and validated by Toptal experts in the same field.
Igor Santos's profile image

Igor Santos

With a big eye on UI/UX, Igor is a developer with strong PHP background (10+ years), moving into JS-land for more interactive experiences.

Expertise

Years of Experience

13

Share

Whether you are building a website or a full-fledged web application, making it accessible to a wider audience often requires it to be available in different languages and locales.

Fundamental differences between most human languages make this anything but easy. The differences in grammar rules, language nuances, date formats, and more combine to make localization a unique and formidable challenge.

Consider this simple example.

Rules of pluralization in English are pretty straightforward: you can have a singular form of a word or a plural form of a word.

In other languages, though – such as Slavic languages – there are two plural forms in addition to the singular one. You may even find languages with a total of four, five, or six plural forms, such as in Slovenian, Irish, or Arabic.

The way your code is organized, and how your components and interface are designed, plays an important role in determining how easily you can localize your application.

Internationalization (i18n) of your codebase, helps ensure that it can be adapted to different languages or regions with relative ease. Internationalization is usually done once, preferably in the beginning of the project to avoid needing huge changes in the source code down the road.

How to Build a Multilingual App: A Demo with PHP and Gettext

Once your codebase has been internationalized, localization (l10n) becomes a matter of translating the contents of your application to a specific language/locale.

Localization needs to be performed every time a new language or region needs to be supported. Also, whenever a part of the interface (containing text) is updated, new content becomes available - which then needs to be localized (i.e., translated) to all supported locales.

In this article, we will learn how to internationalize and localize software written in PHP. We will go through the various implementation options and the different tools that are available at our disposal to ease the process.

Tools for Internationalization

The easiest way to internationalize PHP software is by using array files. Arrays will be populated with translated strings, which can then be looked up from within templates:

This is, however, hardly a recommended way for serious projects, as it will definitely pose maintenance issues down the road. Some issues might even appear in the very beginning, such as the lack of support for variable interpolation or pluralization of nouns and so on.

One of the most classic tools (often taken as reference for i18n and l10n) is a Unix tool called Gettext.

Though dating back to 1995, it is still a comprehensive tool for translating software that is also easy to use. While it is pretty easy to get started with, it still has powerful supporting tools.

Gettext is what we’ll be using in this post. We will be presenting a great GUI application that can be used to easily update your l10n source files, thereby avoiding the need to deal with the command line.

Libraries To Make Things Easy

Major PHP web frameworks and libraries that support Gettext

There are major PHP web frameworks and libraries that support Gettext and other implementations of i18n. Some are easier to install than others, or sport additional features or support different i18n file formats. Although in this document, we focus on the tools provided with the PHP core, here’s a list of some others worth mentioning:

  • oscarotero/Gettext: Gettext support with an object-oriented interface; includes improved helper functions, powerful extractors for several file formats (some of them not supported natively by the gettext command). Can also export to formats beyond just .mo/.po files, which can be useful if you need to integrate your translation files into other parts of the system, like a JavaScript interface.

  • symfony/translation: Supports a lot of different formats, but recommends using verbose XLIFF’s. Doesn’t include helper functions or a built-in extractor, but supports placeholders using strtr() internally.

  • zend/i18n: Supports array and INI files, or Gettext formats. Implements a caching layer to avoid needing to read the file system every time. Also includes view helpers, and locale-aware input filters and validators. However, it has no message extractor.

Other frameworks also include i18n modules, but those are not available outside of their codebases:

  • Laravel: Supports basic array files; has no automatic extractor but includes a @lang helper for template files.

  • Yii: Supports array, Gettext, and database-based translation, and includes a messages extractor. Backed by the Intl extension, available since PHP 5.3, and based on the ICU project. This enables Yii to run powerful replacements, like spelling out numbers, formatting dates, times, intervals, currency, and ordinals.

If you decide to go for one of the libraries that provide no extractors, you may want to use the Gettext formats, so you can use the original Gettext toolchain (including Poedit) as described in the rest of the chapter.

Installing Gettext

You might need to install Gettext and the related PHP library by using your package manager, like apt-get or yum. After it’s installed, enable it by adding extension=gettext.so (Linux/Unix) or extension=php_gettext.dll (Windows) to your php.ini file.

Here we will also be using Poedit to create translation files. You will probably find it in your system’s package manager; it’s available for Unix, Mac, and Windows and can be downloaded for free on its website as well.

Types of Gettext Files

There are three file types you usually deal with while working with Gettext.

The main ones are PO (Portable Object) and MO (Machine Object) files, the first being a list of readable “translated objects” and the second being the corresponding binary (to be interpreted by Gettext when doing localization). There’s also a POT (PO Template) file, that simply contains all existing keys from your source files, and can be used as a guide to generate and update all PO files.

The template files are not mandatory; depending on the tool you’re using to do l10n, you’ll be just fine with only PO/MO files. You’ll have one pair of PO/MO files per language and region, but only one POT per domain.

Separating Domains

There are some cases, in big projects, where you might need to separate translations when the same words convey different meaning in different contexts.

In those cases, you’ll need to split them into different “domains,” which are basically named groups of POT/PO/MO files, where the filename is the said translation domain.

Small and medium-sized projects usually, for simplicity, use only one domain; its name is arbitrary, but we will be using “main” for our code samples.

In Symfony projects, for example, domains are used to separate the translation for validation messages.

Locale Code

A locale is simply a code that identifies one version of a language. It’s defined following the ISO 639-1 and ISO 3166-1 alpha-2 specs: two lower-case letters for the language, optionally followed by an underscore and two upper-case letters identifying the country or regional code.

For rare languages, three letters are used.

For some speakers, the country part may seem redundant. In fact, some languages have dialects in different countries, such as Austrian German (de_AT) or Brazilian Portuguese (pt_BR). The second part is used to distinguish between those dialects - when it’s not present, it’s taken as a “generic” or “hybrid” version of the language.

Directory Structure

To use Gettext, we will need to adhere to a specific structure of folders.

First, you’ll need to select an arbitrary root for your l10n files in your source repository. Inside it, you’ll have a folder for each needed locale, and a fixed “LC_MESSAGES” folder that will contain all your PO/MO pairs.

LC_MESSAGES Folder

Plural Forms

As we said in the introduction, different languages might sport different pluralization rules. However, Gettext saves us this trouble.

When creating a new .po file, you’ll have to declare the pluralization rules for that language, and translated pieces that are plural-sensitive will have a different form for each of those rules.

When calling Gettext in code, you’ll have to specify a number related to the sentence (e.g. for the phrase “You have n messages.”, you will need to specify the value of n), and it will work out the correct form to use - even using string substitution if needed.

Plural rules are composed of the number of rules necessary with a boolean test for each rule (test for at most one rule may be omitted). For example:

  • Japanese: nplurals=1; plural=0; - one rule: there are no plural forms

  • English: nplurals=2; plural=(n != 1); - two rules: use plural form only when n is not 1, otherwise use the singular form.

  • Brazilian Portuguese: nplurals=2; plural=(n > 1); - two rules, use plural form only when n is greater than 1, otherwise use the singular form.

For a deeper explanation, there’s an informative LingoHub tutorial available online.

Gettext will determine which rule to use based on the number provided and will use the correct localized version of the string. For strings where pluralization needs to be handled, you will need to include in the .po file a different sentence for each plural rule defined.

Sample Implementation

After all that theory, let’s get a little practical. Here’s an excerpt of a .po file (don’t worry yet too much about the syntax, but instead just get a sense of the overall content):

msgid ""
msgstr ""
"Language: pt_BR\n"
"Content-Type: text/plain; charset=UTF-8\n"
"Plural-Forms: nplurals=2; plural=(n > 1);\n"

msgid "We're now translating some strings"
msgstr "Nós estamos traduzindo algumas strings agora"

msgid "Hello %1$s! Your last visit was on %2$s"
msgstr "Olá %1$s! Sua última visita foi em %2$s"

msgid "Only one unread message"
msgid_plural "%d unread messages"
msgstr[0] "Só uma mensagem não lida"
msgstr[1] "%d mensagens não lidas"

The first section works like a header, having the msgid and msgstr empty.

It describes the file encoding, plural forms, and a few other things. The second section translates a simple string from English to Brazilian Portuguese, and the third does the same, but leverages string replacement from sprintf, enabling the translation to contain the username and visit date.

The last section is a sample of pluralization forms, displaying the singular and plural version as msgid in English and their corresponding translations as msgstr 0 and 1 (following the number given by the plural rule).

There, string replacement is used as well, so the number can be seen directly in the sentence, by using %d. The plural forms always have two msgid (singular and plural), so it’s advised to not use a complex language as the source of translation.

Localization Keys

As you may have noticed, we’re using the actual English sentence as the source ID. That msgid is the same used throughout all your .po files, meaning other languages will have the same format and the same msgid fields but translated msgstr lines.

Speaking of translation keys, there are two standard “philosophical” approaches here:

1. msgid as a real sentence

The main advantages of this approach are:

  • If there are parts of the software untranslated in any given language, the key displayed will still maintain some meaning. For example, if you know how to translate from English to Spanish but need help translating to French, you might publish the new page with missing French sentences, and parts of the website would be displayed in English instead.

  • It’s much easier for the translator to understand what’s going on and make a proper translation based on the msgid.

  • It gives you “free” l10n for one language - the source one.

On the other hand, the primary disadvantage is that, if you need to change the actual text, you need to replace the same msgid across several language files.

2. msgid as a unique, structured key

This would describe the sentence role in the application in a structured way, including the template or part where the string is located instead of its content.

This is a great way to have the code organized, separating the text content from the template logic. However, that could present problems to the translator who would miss the context.

A source language file would be needed as a basis for other translations. For example, the developer would ideally have an “en.po” file, that translators would read to understand what to write in “fr.po”.

Missing translations would display meaningless keys on screen (“top_menu.welcome” instead of “Hello there, User!” on the said untranslated French page).

That’s good as it would force translation to be complete before publishing - but bad as translation issues would be really awful in the interface. Some libraries, though, include an option to specify a given language as “fallback,” having a similar behavior as the other approach.

The Gettext manual favors the first approach as, in general, it’s easier for translators and users in case of trouble. That’s the approach we’ll be using here as well.

It should be noted, though, that the Symfony documentation favors keyword-based translation, to allow for independent changes of all translations without affecting templates as well.

Everyday Usage

In a common application, you would use some Gettext functions while writing static text in your pages.

Those sentences would then appear in .po files, get translated, compiled into .mo files, and then used by Gettext when rendering the actual interface. Given that, let’s tie together what we have discussed so far in a step-by-step example:

1. A sample template file, including some different gettext calls


Located in Rio de Janeiro - State of Rio de Janeiro, Brazil

Member since February 15, 2016

About the author

With a big eye on UI/UX, Igor is a developer with strong PHP background (10+ years), moving into JS-land for more interactive experiences.

Toptalauthors are vetted experts in their fields and write on topics in which they have demonstrated experience. All of our content is peer reviewed and validated by Toptal experts in the same field.

Expertise

Years of Experience

13

World-class articles, delivered weekly.

Subscription implies consent to our privacy policy

World-class articles, delivered weekly.

Subscription implies consent to our privacy policy

Toptal Developers

Join the Toptal® community.