DokuWiki

It's better when it's simple

User Tools

Site Tools


tips:ewiki2doku

ErfurtWiki (ewiki) to DokuWiki Converter

Below PHP script is a minor attempt in converting ErfurtWiki (ewiki) pages to DokuWiki format. It is a mod of moinmoin2doku (thanks!). It has not been tested much, it was only used to convert a bunch of ewiki installations of about 400 pages. After the conversion the new pages still need manual editing.

Requirements

  • php (on the command line)
  • ewiki flat files (if you use ewiki with a SQL database you first need to export the pages to plain text files)

Capabilities

It is able to transform

  • Headings
  • Links (most, including most CamelCase)
  • Bold/italic/monospaced/teletype
  • big/small text (markup removed)
  • Lists
  • <pre> code blocks
  • InterWiki links (see code and customize for your needs)

Missing features/bugs

It is not able to transform

  • Internal images
  • <pre> (is only converted to <code>)
  • and probably quite some more; please test first and use at your own risk only

Source

File ewiki2doku.php:

#!/usr/bin/php
<?php
// ewiki2doku.php
 
// Use at your own risk! No warranty implied!
// Before using you need to remove the space from line '< /code>'
// it is only included to be able to show the source here in DokuWiki
 
//check command line parameters
if ($argc != 3 || in_array($argv[1], array('--help', '-help', '-h', '-?'))) {
  echo "\n  Converts all files from given directory\n";
  echo "  from ErfurtWiki to DokuWiki syntax. NOT RECURSIV\n\n";
  echo "  Usage:\n";
  echo "  ".$argv[0]." <input dir> <output dir>\n\n";
} 
else {
  //get input and output directories
  $inDir = realpath($argv[1]) or die("input dir error");
  $outDir = realpath($argv[2]) or die("output dir error");
  //just print information
  echo "\nInput Directory: ".$inDir."\n";
  echo "Output Directory: ".$outDir."\n\n";
 
  //get all files from directory
  if (is_dir($inDir)) {
    $files = filesFromDir($inDir);
  }
 
  //migrate each file
  foreach ($files As $file) {
    //convert filename
    $ofile = convFileNames($file);
    //just print information
    echo "Migrating from ".$inDir."/".$file." to ".$outDir."/".$ofile."\n";
 
    //read input file
    $text = readFl($inDir."/".$file);
 
    //convert content
    $text = ewiki2doku($text);
 
    //encode in utf8
    $text = utf8_encode($text);
 
    //write output file
    writeFl($outDir."/".$ofile, $text);
  }
}
 
function ewiki2doku($text) {
 
  //line by line
  $lines = explode("\n", $text);
  foreach($lines As $line) {
    //start converting
    $find = Array(
       '/\[notify: ?[^ ]*\]/',         //remove [notify:...]
       '/\[jump:([^]]+)\]/',           //[jump:...]
       '/<\?plugin *settitle(.*)\?>/i', //sort of a heading 1
       '/^    *([^ ])/',               //indented paragraphs (we always used 4 spaces but also [tab] is allowed
       '/%%%/',                        //newline
       '/([^!~=|[])(\b[A-Z]+[a-z]+[A-Z][A-Za-z]*\b):(\b[A-Z]+[a-z]+[A-Z][A-Za-z]*\b)(([^]|#])|$)/',
                                       //CamelCase InterWiki link
       '/([^-!~=|>&[])(\b[A-Z]+[a-z]+[A-Z][A-Za-z]*\b)(([^]|#>])|$)/', //CamelCase, dont change if CamelCase is in InternalLink
       '/([^!~]|^)\[([^] |[]+)\]/',    //internal link
       '/\[([^]|[]+)\|([^]|[]+)\]/',   //external links and links with |
       '/\["([^"]+)" ([^ ]+)\]/',      //Ewiki ["..." ...] style links ([... "..."] not recognized)
       '/\[\[([^ :]+):([^]\/@]+)\]\]/', //InterWiki link (the /@ tries to exclude http:// and mailto:)
       '/\[\[(([^] |[]+)\.(png|jpe?g|gif))\]\]/', //image link (only some)
       '/<pre>/',                      //pre open
       '/<\/pre>/',                    //pre close
       '/^\* /',                       //lists 1
       '/^\*\* /',                     //lists 2
       '/^\*\*\* /',                   //lists 3
       '/^# /',                        //ordered lists 1
       '/^## /',                       //ordered lists 2
       '/^### /',                      //ordered lists 3
       '/^!{3} ?(.*)$/',               //heading 1
       '/^!{2} ?(.*)$/',               //heading 2
       '/^!{1} ?(.*)$/',               //heading 3
       '/__([^_]+)__/',                //bold 1
       '/\*\*([^*]+)\*\*/',            //bold 2
       '/\'\'([^\']+)\'\'/',           //italic (emphasize)
       '/==(([^= ][^=]+)|[^=])==/',    //monospaced (also taking care of ==X==)
       '/<tt>(.+)<\/tt>/',             //teletype
       '/##([^#]+)##/',                //big text
       '/µµ([^µ]+)µµ/',                //small text
       '/[!~](\b[A-Z]+[a-z]+[A-Z][A-Za-z]*\b)/', //~CamelCase + !CamelCase
       '/[!~](\[[^][]+\])/',           //~[text] + !text (just remove ~ and !)
       '/<cc>([A-Z]+[a-z]+[A-Z][A-Za-z>]*)<\/cc>/', //CamelCase, dont change if CamelCase is in InternalLink
       '/^(=+ .*)\[\[(.*)\]\](.* =+)$/',   //remove links in headlines
       '/<([-A-Za-z0-9+_.]+@[-A-Za-z0-9_]+\.[-A-Za-z0-9_.]+[A-Za-z])>/', //<email> addresses
       '/([^<:!~]|^)(\b[-A-Za-z0-9+_.]+@[-A-Za-z0-9_]+\.[-A-Za-z0-9_.]+[A-Za-z]\b)([^>]|$)/', //email addresses
       '/^keywords: /',                //misc1
       '/\[\[ManPages>/',              //misc2
       '/\[\[WikiPedia>/',             //misc3
       '/\[\[FooBarWiki>/'             //misc4
       );
    $replace = Array(
       '',                             //remove [notify:...]
       'Please go to [${1}]',          //[jump:...]
       '====== ${1} ======',           //heading 1 (from plugin settitle)
       '> ${1}',                       //indented paragraphs
       '\\\\\\ ',                      //newline
       '${1}<cc>${2}>${3}</cc>${4}',   //CamelCase InterWiki link
       '${1}<cc>${2}</cc>${3}',        //CamelCase (preparation, see below for finish)
       '${1}[[${2}]]',                 //internal link
       '[[${2}|${1}]]',                //external link and links with |
       '[[${2}|${1}]]',                //Ewiki ["..." ...] style links
       '[[${1}>${2}]]',                //InterWiki link
       '{{${1}}}',                     //images link
       '<code>',                       //(<pre>) code open
       '< /code>',                     //(</pre>)code close - remove space between < and /, it is included for viewing in dokuwiki
       '  * ',                         //lists 1
       '    * ',                       //lists 2
       '      * ',                     //lists 3
       '  - ',                         //ordered lists 1
       '    - ',                       //ordered lists 2
       '      - ',                     //ordered lists 3
       '====== ${1} ======',           //heading 1
       '===== ${1} =====',             //heading 2
       '==== ${1} ====',               //heading 3
       '**${1}**',                     //bold 1
       '**${1}**',                     //bold 2
       '//${1}//',                     //italic (emphasize)
       '\'\'${1}\'\'',                 //monospaced
       '\'\'${1}\'\'',                 //teletype
       '**${1}**',                     //big text -- no markup in dokuwiki
       '${1}',                         //small text -- no markup in dokuwiki
       '${1}',                         //~CamelCase + !CamelCase
       '${1}',                         //~[text] + !text (just remove ~ and !)
       '[[${1}]]',                     //CamelCase, finish <cc>CamelCase</cc>
       '${1}${2}${3}',                 //remove links in headlines
       '${1}',                         //<email> addresses
       '${1}<${2}>${3}',               //email addresses
       '**keywords:** ',               //misc1
       '[[man>',                       //misc2
       '[[wp>',                        //misc3
       '[[FooBarWiki>'                 //misc4
       );
    $line = preg_replace($find,$replace,$line);
 
    $ret = $ret.$line."\n";
  }
  return $ret;
}
 
function convFileNames($name) {
  /* ö,ä,ü, ,. and more
  */
  $find = Array('/_20/',
                '/_5f/',
                '/_2e/',
                '/_c4/',
                '/_f6/',
                '/_fc/',
                '/_26/',
                '/_2d/'
                );
  $replace = Array('_',
                   '_',
                   '_',
                   'ae',
                   'oe',
                   'ue',
                   '_',
                   '-'
                   );
  $name = preg_replace($find,$replace,$name);
  $name = strtolower($name);
  return $name.".txt";
}
 
 
function filesFromDir($dir) {
  $files = Array();
  $handle=opendir($dir);
  while ($file = readdir ($handle)) {
     if ($file != "." && $file != ".." && !is_dir($dir."/".$file)) {
         array_push($files, $file);
     }
  }
  closedir($handle); 
  return $files;
}
 
function readFl($file) {
  $fr = fopen($file,"r");
  if ($fr) {
    while(!feof($fr)) {
      $text = $text.fgets($fr);
    }
    fclose($fr);
  }
  return $text;
}
 
function writeFl($file, $text) {
  $fw = fopen($file, "w");
  if ($fw) {
    fwrite($fw, $text);
  }
  fclose($fw);
}
 
?>

Changelog

1)
Fixed a CamelCase issue, added support for more e-mail addresses and the settitle plugin
tips/ewiki2doku.txt · Last modified: 2010-09-02 09:07 by mluigi

Except where otherwise noted, content on this wiki is licensed under the following license: CC Attribution-Share Alike 4.0 International
CC Attribution-Share Alike 4.0 International Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki