====== DokuWiki UTF8 conversion ======
[[:DokuWiki]] uses [[:UTF-8]] encoding for storing data since release 2005-02-06. This allows you to add all kind of languages to the same Wiki-Installation. This means if you upgrade from an older version you need to reencode your data files.
**If you are installing DokuWiki for the first time, you don't need to do anything** - DokuWiki will work out of the box.
You can either recode all existing pages yourself, eg. using [[man>iconv]] or [[man>recode]] or use the "UTF-8 conversion helper" described below.
If you do the conversion yourself, please note that DokuWiki stores filenames [[phpfn>rawurlencode|urlencoded]] so you may have to rename your files, too.
===== UTF-8 conversion helper =====
:!: **This script wasn't updated for a long time and is not compatible with newer DokuWiki releases** so it will not work out of the box anymore. You should have a look at the bash script below for an alternative way to upgrade old datafiles.
The simplest way to upgrade your datafiles to UTF8 is to use the "dokuwiki-convert" script: {{:tips:dokuwiki-convert-latest.tgz|}}
The script will walk through your data directory and reencode all the files for you.
==== Usage ====
- Recommended: Deny writing for all users to your Wiki using the [[:ACL]] feature or a [[http://httpd.apache.org/docs/howto/htaccess.html|.htaccess]] file
- create a Backup of all your files :!:
- upgrade your DokuWiki to the newest version [[:install|as usual]]
- install dokuwiki-convert somewhere on your webserver ((you can put it as an additonal directory in your DokuWiki directory if you like))
- edit the ''dokuwiki-convert/index.php'' file
* You need to set the full filesystem path to your DokuWiki at the very top eg. ''/var/www/dokuwiki/''
- point your webbrowser to the dokuwiki-convert script
- choose your current file encoding
- hit the ''Do the conversion'' button
==== Additional Notes ====
* The script __does not__ convert your old revisions.
* You need to delete them, or convert them your self.
* The script __does not__ convert your changes.log.
* You need to delete them, or convert them your self.
* The script may timeout when running in safemode
* just rerun it multiple times until it says it has finished
* if it does not work for you, you need to do the conversion yourself
* For english wikis the script will skip a lot of files
* US-ASCII is a subset of UTF-8 so there is no need for converting these files
===== Sample Bash script for conversion with iconv =====
> The following code might be helpful in doing the conversion yourself with iconv. Besides converting the data dir, this script __does__ convert changes.log and the old revisions. Run this script from the data directory
#!/bin/bash
FROM=latin1
TO=utf8
ICONV="iconv -f $FROM -t $TO"
# Convert changes.log
cp changes.log changes.log.bak
$ICONV < changes.log.bak > changes.log
rm changes.log.bak
# Convert pages/ subdir
find pages/ -type f -name "*.txt" | while read fn; do
cp ${fn} ${fn}.bak
$ICONV < ${fn}.bak > ${fn}
rm ${fn}.bak
done
# Convert attic/ subdir (where the script assumes gzip compression)
find attic/ -type f -name "*.txt.gz" | while read fn; do
cp ${fn} ${fn}.bak
{ gzip -cd | $ICONV | gzip -c; } < ${fn}.bak > ${fn}
rm ${fn}.bak
done
> To use this script in WindowsXP Pro (or Windows 2000 Pro) with Cygwin, for ISO8859-15 (pt_PT), I had to change the first lines of the script to:
#!/bin/bash
FROM=ISO8859-15
TO=UTF-8
> Everything else remains the same, and the result of the execution was successful. I've been able to convert two entire DokuWiki-enabled sites in less than 5 minutes. I found out about the correct encodings after issuing the following command on a Cygwin-Bash Prompt:
iconv -l
> I have modified the script to keep the timestamps for the files in ''data/'' --- //[[andrea@gualano.net|Andrea]] 2005-11-04 11:57//
# Convert data/ subdir
find data/ -type f -name "*.txt" | while read fn; do
cp -p ${fn} ${fn}.bak
$ICONV < ${fn}.bak > ${fn}
touch -r ${fn}.bak ${fn}
rm ${fn}.bak
done
> I have modified it again to keep unmodified files with the same timestamps, for easier using with CVS, it looks for *.java and *.jsp files in the current directory and subdirs --- //[[fbotelho@stj.gov.br|Flavio]] 2008-01-29//
#!/bin/bash
FROM=cp1252
TO=utf8
ICONV="iconv -f $FROM -t $TO"
find . -type f -name "*.java" -or -name "*.jsp" | while read fn; do
cp ${fn} ${fn}.bak
touch -r ${fn} ${fn}.bak
$ICONV < ${fn}.bak > ${fn}
TEST=`cmp ${fn} ${fn}.bak`
if [ -z "$TEST" ]; then
touch -r ${fn}.bak ${fn}
else
echo MODIFIED - ${fn}
fi
rm ${fn}.bak
done
===== manual conversion with editpad lite =====
as i couldn't get the above scripts working, i converted my pages manually, using the ansi>utf-8 converter from [[http://www.editpadpro.com/editpadlite.html|edit pad]]