DokuWiki

It's better when it's simple

User Tools

Site Tools


tips:romanize

Tips and Tricks

Romanize filenames

Keywords: UTF-8, romanize, cyrillic, latin, convert, filename

When upgrading from previous versions that did not yet have the “romanize” function, you will encounter a completely 'unreadable' directory structure.

For example: %D0%BA%D1%8B%D1%80%D0%B3%D1%8B%D0%B7%D1%81%D1%82%D0%B0%D0%BD.txt is the same as кыргызстан.txt

This is because UTF-8 filenames have been urlencoded.

In later versions, the “romanization” option has been added to circumvent this problem. 1)

The script below will convert this unreadable directory structure to “romanized” filenames.

You will have to include the UTF8.php file which is part of the dokuwiki installation.

Note: this script is not error free: for example: there are some cyrillic characters that will end your filename with “'”. This is because in UTF-8.php the transliteration of the 'ъ' is as “'”

Please check your pagestructure after conversion for invalid filenames.

I hope this will help someone. Any improvements welcome.

Update: UTF8.php has been rewritten, code below has only been tested with this version of UTF8.php

<?php
 
include("utf8.php"); //to be found in the \inc directory of the default dokuwiki install 
 
/**
 * Copy a file, or recursively copy a folder and its contents, and clean up the filenames according to the dokuwiki UTF-8 
 *
 * @original_author      Aidan Lister <aidan@php.net>
 * @link        http://aidanlister.com/repos/v/function.copyr.php
 * @param       string   $source    Source path
 * @param       string   $dest      Destination path
 * @return      bool     Returns TRUE on success, FALSE on failure
 */
function copyr($source, $dest)
{
	$dest2=cleanID($dest);
	echo $source."->".$dest." ->$dest2<br/>\n";
    // Simple copy for a file
    if (is_file($source)) {
        return copy($source, $dest2);
    }
 
    // Make destination directory
    if (!is_dir($dest)) {
        mkdir($dest2);
 
	}
 
    // Loop through the folder
    $dir = dir($source);
    while (false !== $entry = $dir->read()) {
        // Skip pointers
        if ($entry == '.' || $entry == '..') {
            continue;
        }
 
        // Deep copy directories
        if ($dest !== "$source/$entry") {
            copyr("$source/$entry", "$dest/$entry");
        }
    }
 
    // Clean up
    $dir->close();
    return true;
}
 
copyr("/dokuwiki/data/pages/","/dokuwiki/data/pagesnew/");
 
function cleanID($id,$ascii=false){
  $id = trim(urldecode($id));
  $id = utf8_strtolower($id);
  $id = utf8_romanize($id);
  utf8_deaccent($id,-1);
  $id = preg_replace('#\'+#','_',$id);
  return($id);
}
 
?>
1)
see deaccent and romanization for more info
tips/romanize.txt · Last modified: 2020-02-22 21:38 by Aleksandr

Except where otherwise noted, content on this wiki is licensed under the following license: CC Attribution-Share Alike 4.0 International
CC Attribution-Share Alike 4.0 International Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki