Table of Contents
DokuWiki UTF8 conversion
DokuWiki uses UTF-8 encoding for storing data since release 2005-02-06. This allows you to add all kind of languages to the same Wiki-Installation. This means if you upgrade from an older version you need to reencode your data files.
If you are installing DokuWiki for the first time, you don't need to do anything - DokuWiki will work out of the box.
You can either recode all existing pages yourself, eg. using iconv or recode or use the “UTF-8 conversion helper” described below.
If you do the conversion yourself, please note that DokuWiki stores filenames urlencoded so you may have to rename your files, too.
UTF-8 conversion helper
This script wasn't updated for a long time and is not compatible with newer DokuWiki releases so it will not work out of the box anymore. You should have a look at the bash script below for an alternative way to upgrade old datafiles.
The simplest way to upgrade your datafiles to UTF8 is to use the “dokuwiki-convert” script: dokuwiki-convert-latest.tgz
The script will walk through your data directory and reencode all the files for you.
Usage
- create a Backup of all your files
- upgrade your DokuWiki to the newest version as usual
- install dokuwiki-convert somewhere on your webserver 1)
- edit the
dokuwiki-convert/index.php
file- You need to set the full filesystem path to your DokuWiki at the very top eg.
/var/www/dokuwiki/
- point your webbrowser to the dokuwiki-convert script
- choose your current file encoding
- hit the
Do the conversion
button
Additional Notes
- The script does not convert your old revisions.
- You need to delete them, or convert them your self.
- The script does not convert your changes.log.
- You need to delete them, or convert them your self.
- The script may timeout when running in safemode
- just rerun it multiple times until it says it has finished
- if it does not work for you, you need to do the conversion yourself
- For english wikis the script will skip a lot of files
- US-ASCII is a subset of UTF-8 so there is no need for converting these files
Sample Bash script for conversion with iconv
The following code might be helpful in doing the conversion yourself with iconv. Besides converting the data dir, this script does convert changes.log and the old revisions. Run this script from the data directory
#!/bin/bash FROM=latin1 TO=utf8 ICONV="iconv -f $FROM -t $TO" # Convert changes.log cp changes.log changes.log.bak $ICONV < changes.log.bak > changes.log rm changes.log.bak # Convert pages/ subdir find pages/ -type f -name "*.txt" | while read fn; do cp ${fn} ${fn}.bak $ICONV < ${fn}.bak > ${fn} rm ${fn}.bak done # Convert attic/ subdir (where the script assumes gzip compression) find attic/ -type f -name "*.txt.gz" | while read fn; do cp ${fn} ${fn}.bak { gzip -cd | $ICONV | gzip -c; } < ${fn}.bak > ${fn} rm ${fn}.bak done
To use this script in WindowsXP Pro (or Windows 2000 Pro) with Cygwin, for ISO8859-15 (pt_PT), I had to change the first lines of the script to:
#!/bin/bash FROM=ISO8859-15 TO=UTF-8
Everything else remains the same, and the result of the execution was successful. I've been able to convert two entire DokuWiki-enabled sites in less than 5 minutes. I found out about the correct encodings after issuing the following command on a Cygwin-Bash Prompt:
iconv -l
I have modified the script to keep the timestamps for the files indata/
— Andrea 2005-11-04 11:57
# Convert data/ subdir find data/ -type f -name "*.txt" | while read fn; do cp -p ${fn} ${fn}.bak $ICONV < ${fn}.bak > ${fn} touch -r ${fn}.bak ${fn} rm ${fn}.bak done
I have modified it again to keep unmodified files with the same timestamps, for easier using with CVS, it looks for *.java and *.jsp files in the current directory and subdirs — Flavio 2008-01-29
#!/bin/bash FROM=cp1252 TO=utf8 ICONV="iconv -f $FROM -t $TO" find . -type f -name "*.java" -or -name "*.jsp" | while read fn; do cp ${fn} ${fn}.bak touch -r ${fn} ${fn}.bak $ICONV < ${fn}.bak > ${fn} TEST=`cmp ${fn} ${fn}.bak` if [ -z "$TEST" ]; then touch -r ${fn}.bak ${fn} else echo MODIFIED - ${fn} fi rm ${fn}.bak done
manual conversion with editpad lite
as i couldn't get the above scripts working, i converted my pages manually, using the ansi>utf-8 converter from edit pad