Using a UTF-8 Encoding

This article explains how to configure your TYPO3 website in order to use  UTF-8 encoding instead of the default  ISO 8859-1 (a.k.a. Latin1) for both database storage and page rendering.

Informations are taken partially from my own experience and otherwise from two main sources:

Sections below explain how you may create a new website for TYPO3 configured with UTF-8. If you wish to convert an existing website, you should skip these sections and jump to “Converting an Existing Database”.

Creating the Database in UTF-8

The SQL statements to use in order to create a database whose encoding is UTF-8 are given below. We are using the utf8_general_ci collation as it allows case insensitive queries to be performed, and we think it is the best behaviour in most cases.

Creating the database:

CREATE DATABASE db_name DEFAULT CHARACTER SET utf8 COLLATE utf8_general_ci;
GRANT ALL PRIVILEGES ON db_name.* TO username@localhost IDENTIFIED BY 'password';

Environment Configuration

Apache (optional)

Add line

AddDefaultCharset utf-8

to your virtual host definition.

PHP

Edit file php.ini in order to load the mandatory libraries:

extension=php_iconv.so
extension=php_mbstring.so

TYPO3 will use the correct library thanks to the load instruction in file localconf.php (see further).

Debian Notice

If using a Debian server, you should not modify file php.ini but instead create two files in /etc/php5/conf.d:

php_iconv.ini

# configuration for php iconv module
extension=php_iconv.so

php_mbstring.ini

# configuration for php mbstring module
extension=php_mbstring.so

However, if you use PHP 5.2, you do not need to include those libraries.

TYPO3 Install Tool

In the 5th menu option:

  • setDBinit is set to:
SET NAMES utf8
  • UTF8filesystem is left unchecked
  • forceCharset = utf-8

Most of the time, it is sufficient to edit and add the following configuration to your localconf.php:

// For backend charset
$TYPO3_CONF_VARS['BE']['forceCharset'] = 'utf-8';
 
// For GIFBUILDER support
// Set it to 'iconv' or 'mbstring'
$TYPO3_CONF_VARS['SYS']['t3lib_cs_convMethod'] = 'iconv';
$TYPO3_CONF_VARS['SYS']['t3lib_cs_utils'] = 'iconv';
 
$TYPO3_CONF_VARS['SYS']['setDBinit'] = 'SET NAMES utf8;';

TypoScript Configuration

We now have a well-configured backend. The remaining step is to ensure that generated pages are using UTF-8 encoding.

You might not be able to apply the Apache configuration as explained above according to type of access you got with the server hosting your TYPO3 website. However, it always is a good idea to add a few configuration instructions to the setup part of your template:

config.doctype = xhtml_trans
config.renderCharset = utf-8
config.additionalHeaders = Content-Type:text/html;charset=utf-8
config.xhtml_cleaning = all

Both the HTTP header and the charset have been modified and are sent to the browser.

The information below shows what is sent:

HTTP Header
Content-type: text/html; charset=utf8
XML Encoding
<?xml version="1.0" encoding="utf-8"?>
Meta Tag
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />

Converting an Existing Database

You may have installed TYPO3 the first time, just as I did, that is without thinking much about the database encoding. After all, it works, doesn’t it? Here is the way to convert your database from a ISO 8859-1 encoding to UTF-8.

Let’s start with a database export from a shell on your MySQL server:

$ mysqldump -u (user) -p db_name \
--ignore-table=typo3.sys_refindex > dump.sql

Now fix the encoding:

$ sed -e 's/latin1/utf8/g' -i dump.sql

And reimport the database:

$ mysql -u (user) -p --default-character-set=utf8 db_name < dump.sql

We just have to change database parameters and here we are:

$ mysql -u (user) -p \
-e "ALTER DATABASE db_name DEFAULT CHARACTER SET utf8 DEFAULT COLLATE utf8_general_ci"

You may also try  an automatic conversion script written by Jigal van Hemert.

File Encoding Conversion

If you use TemplaVoilà! for instance, you will certainly have to perform an encoding conversion of all your template files. This is not a tricky task as you only have to use command iconv:

$ iconv -f iso-8859-1 -t utf-8 source.html > dest.html
Flattr