It's better when it's simple

User Tools

Site Tools


UTF-8 String Handling

PHP treats all strings as ASCII by default. The recommended way of working with UTF-8 strings is to use the mb_string extension. Unfortunately this library is not always available. DokuWiki comes with a library that will handle all UTF-8 string in pure PHP when mb_string is not available.

Note: Only use UTF-8 aware functions when needed. If operations can be done on byte level without special care for character boundaries this should be done as it is usually much faster.

The available UTF-8 aware methods can be found in the \dokuwiki\Utf8 namespace.

  • dokuwiki\Utf8\Asian provides methods to treat Asian scripts as “words”. This is mostly used to tokenize Asian texts for full text search
  • dokuwiki\Utf8\Clean provides methods to check and clean strings. It also provides romanization for some language scripts.
  • dokuwiki\Utf8\Conversion provides methods to convert between Unicode dialects and HTML entitites
  • dokuwiki\Utf8\PhpString provides UTF-8 aware replacements for typical PHP string methods like strlen, substr, strtolower, etc. This is probably the class, plugin authors might use the most.
  • dokuwiki\Utf8\Sort provides language aware sorting without the need for the intl extension (but will use it if available).
devel/utf-8.txt · Last modified: 2022-08-22 14:21 by andi

Except where otherwise noted, content on this wiki is licensed under the following license: CC Attribution-Share Alike 4.0 International
CC Attribution-Share Alike 4.0 International Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki