====== DokuWiki UTF8 conversion ====== [[:DokuWiki]] uses [[:UTF-8]] encoding for storing data since release 2005-02-06. This allows you to add all kind of languages to the same Wiki-Installation. This means if you upgrade from an older version you need to reencode your data files. **If you are installing DokuWiki for the first time, you don't need to do anything** - DokuWiki will work out of the box. You can either recode all existing pages yourself, eg. using [[man>iconv]] or [[man>recode]] or use the "UTF-8 conversion helper" described below. If you do the conversion yourself, please note that DokuWiki stores filenames [[phpfn>rawurlencode|urlencoded]] so you may have to rename your files, too. ===== UTF-8 conversion helper ===== :!: **This script wasn't updated for a long time and is not compatible with newer DokuWiki releases** so it will not work out of the box anymore. You should have a look at the bash script below for an alternative way to upgrade old datafiles. The simplest way to upgrade your datafiles to UTF8 is to use the "dokuwiki-convert" script: {{:tips:dokuwiki-convert-latest.tgz|}} The script will walk through your data directory and reencode all the files for you. ==== Usage ==== - Recommended: Deny writing for all users to your Wiki using the [[:ACL]] feature or a [[http://httpd.apache.org/docs/howto/htaccess.html|.htaccess]] file - create a Backup of all your files :!: - upgrade your DokuWiki to the newest version [[:install|as usual]] - install dokuwiki-convert somewhere on your webserver ((you can put it as an additonal directory in your DokuWiki directory if you like)) - edit the ''dokuwiki-convert/index.php'' file * You need to set the full filesystem path to your DokuWiki at the very top eg. ''/var/www/dokuwiki/'' - point your webbrowser to the dokuwiki-convert script - choose your current file encoding - hit the ''Do the conversion'' button ==== Additional Notes ==== * The script __does not__ convert your old revisions. * You need to delete them, or convert them your self. * The script __does not__ convert your changes.log. * You need to delete them, or convert them your self. * The script may timeout when running in safemode * just rerun it multiple times until it says it has finished * if it does not work for you, you need to do the conversion yourself * For english wikis the script will skip a lot of files * US-ASCII is a subset of UTF-8 so there is no need for converting these files ===== Sample Bash script for conversion with iconv ===== > The following code might be helpful in doing the conversion yourself with iconv. Besides converting the data dir, this script __does__ convert changes.log and the old revisions. Run this script from the data directory #!/bin/bash FROM=latin1 TO=utf8 ICONV="iconv -f $FROM -t $TO" # Convert changes.log cp changes.log changes.log.bak $ICONV < changes.log.bak > changes.log rm changes.log.bak # Convert pages/ subdir find pages/ -type f -name "*.txt" | while read fn; do cp ${fn} ${fn}.bak $ICONV < ${fn}.bak > ${fn} rm ${fn}.bak done # Convert attic/ subdir (where the script assumes gzip compression) find attic/ -type f -name "*.txt.gz" | while read fn; do cp ${fn} ${fn}.bak { gzip -cd | $ICONV | gzip -c; } < ${fn}.bak > ${fn} rm ${fn}.bak done > To use this script in WindowsXP Pro (or Windows 2000 Pro) with Cygwin, for ISO8859-15 (pt_PT), I had to change the first lines of the script to: #!/bin/bash FROM=ISO8859-15 TO=UTF-8 > Everything else remains the same, and the result of the execution was successful. I've been able to convert two entire DokuWiki-enabled sites in less than 5 minutes. I found out about the correct encodings after issuing the following command on a Cygwin-Bash Prompt: iconv -l > I have modified the script to keep the timestamps for the files in ''data/'' --- //[[andrea@gualano.net|Andrea]] 2005-11-04 11:57// # Convert data/ subdir find data/ -type f -name "*.txt" | while read fn; do cp -p ${fn} ${fn}.bak $ICONV < ${fn}.bak > ${fn} touch -r ${fn}.bak ${fn} rm ${fn}.bak done > I have modified it again to keep unmodified files with the same timestamps, for easier using with CVS, it looks for *.java and *.jsp files in the current directory and subdirs --- //[[fbotelho@stj.gov.br|Flavio]] 2008-01-29// #!/bin/bash FROM=cp1252 TO=utf8 ICONV="iconv -f $FROM -t $TO" find . -type f -name "*.java" -or -name "*.jsp" | while read fn; do cp ${fn} ${fn}.bak touch -r ${fn} ${fn}.bak $ICONV < ${fn}.bak > ${fn} TEST=`cmp ${fn} ${fn}.bak` if [ -z "$TEST" ]; then touch -r ${fn}.bak ${fn} else echo MODIFIED - ${fn} fi rm ${fn}.bak done ===== manual conversion with editpad lite ===== as i couldn't get the above scripts working, i converted my pages manually, using the ansi>utf-8 converter from [[http://www.editpadpro.com/editpadlite.html|edit pad]]