DokuWiki

It's better when it's simple

User Tools

Site Tools


Sidebar

Translations of this page?:

Learn about DokuWiki

Advanced Use

Corporate Use

Our Community


Follow us on Facebook, Twitter and other social networks.

utf-8

UTF-8 Encoding

DokuWiki now stores all its data in UTF-8. To avoid problems, the filenames of the datafiles itself are URL-encoded when saved. DokuWiki versions older than release 2005-02-06 used different encodings so the datafiles need to be reencoded when the software is updated. Switching the used encoding to charsets different from UTF-8 is not supported.

Browser Setup for UTF-8

All modern browsers do handle UTF-8 encoded web pages - it's one of the few things that actually work as expected in most browsers. If your browser doesn't display some characters correctly, you are probably missing the correct Unicode fonts.

Windows users should install the Arialuni.TTF font from Microsoft. It is included in Microsoft's Office Suite.

Debian users can read my page on fonts to learn how to install Unicode fonts correctly.

Editing Files

Save without a BOM in Notepad 2

If you intend to edit the data files directly or want to create a translation. You need to use a UTF-8 aware editor. There are a lot of capable editors out there, I just want to recommend two small, simple, and free ones here if you still need one 1) :

  • TEA – a GTK2 based editor for GNU/Linux
  • Notepad2 – a very good notepad replacement for Windows

Please note: DokuWiki does not use a Byte Order Mark and you should make sure your software doesn't, either (especially when editing the PHP and config files).

batch Encoding file

  • On Window use recode, a port of iconv: http://recode.progiciels-bpi.ca/archives
    • Example of a simple conversion for french local computer:
      recode lat1..u8 test.txt

      with lat1 the source charset and u8 the conversion charset for UTF-8.

    • To batch the conversion on Windows use this (conversion of all the files in sub-directory)
      FOR /F "tokens=*" %%G IN ('dir/b/S/X ^"C:\yourpath\*.txt^"') DO recode -v lat1..u8 %%~sG
  • More explanation there: the link

Examples

Below are some examples of UTF-8 characters to check your browser2).

Zodiac Signs: ♈ ♉ ♊ ♋ ♌ ♍ ♎ ♏ ♐ ♑ ♒ ♓

A chessboard:

A B C D E F G H
8
7
6
5
4
3
2
1

Russian (по-русски):

По оживлённым берегам
Громады стройные теснятся
Дворцов и башен; корабли
Толпой со всех концов земли
К богатым пристаням стремятся;

Ancient Greek:

Αρχαίο Πνεύμα Αθάνατον! Ἰοὺ ἰού· τὰ πάντʼ ἂν ἐξήκοι σαφῆ.

Ὦ φῶς, τελευταῖόν σε προσϐλέψαιμι νῦν,
ὅστις πέφασμαι φύς τʼ ἀφʼ ὧν οὐ χρῆν, ξὺν οἷς τʼ
οὐ χρῆν ὁμιλῶν, οὕς τέ μʼ οὐκ ἔδει κτανών.

Modern Greek:

Η σύγχρονη Ελλάδα, έχει να παρουσιάσει δυναμικό
έργο στον τομέα του πολιτισμού, των τεχνών και
των γραμμάτων. Αντίστοιχα δυναμική είναι η παρουσία
των Ελλήνων επιχειρηματιών στην διεθνή οικονομική
και βιομηχανική σκηνή.

Sanskrit:

पशुपतिरपि तान्यहानि कृच्छ्राद्
अगमयदद्रिसुतासमागमोत्कः । 
कमपरमवशं न विप्रकुर्युर्
विभुमपि तं यदमी स्पृशन्ति भावाः ॥

Hindi:

गूगल समाचार हिन्दी में

Korean:

한글은 아름다운 우리글입니다.
곱고 아름답게 사용하는 것이 우리의 의무입니다.

Chinese:

子曰:「學而時習之,不亦說乎?有朋自遠方來,不亦樂乎?
人不知而不慍,不亦君子乎?」

有子曰:「其為人也孝弟,而好犯上者,鮮矣;
不好犯上,而好作亂者,未之有也。君子務本,本立而道生。
孝弟也者,其為仁之本與!」

Japanese:

「秋の田の かりほの庵の 苫をあらみ わが衣手は 露にぬれつつ」 天智天皇
「春すぎて 夏来にけらし 白妙の 衣ほすてふ 天の香具山」 持統天皇
「あしびきの 山鳥の尾の しだり尾の ながながし夜を ひとりかも寝む」 柿本人麻呂 

Latvian:

Iedomu jaukie ideāli,
Vecākie principi, tikla, mīla - 
Dienas allažības priekšā
Šķīst kā graudi akmeņstarpā.
Glāžšķūņa rūķīši jautri dziedādami čiepj koncertflīģeļa vāku. 

Simplified Chinese:

这是简体字汉语。 zhè shì jiǎn tǐ zì hàn yǔ 

Armenian:

Հարգանքներիս հավաստիքը Հայ Ժողովրդին:
Ամենալավ օրենքները չեն օգնի, եթե մարդիկ բանի պետք չեն:

Persian:

بنی‌آدم اعضای یک‌دیگرند / که در آفرینش ز یک گوهرند

Hebrew:

המשפט עם הזכוכית שאפשר לאכול בלי שזה מפריע, לא זוכר איך הוא הולך
1) This is neither intended to be a complete list of Unicode editors, nor as a selection of the best available choices. It's just two small editors I did like. Please do not add more editors.
utf-8.txt · Last modified: 2014-12-12 11:43 by ach