DokuWiki

It's better when it's simple

User Tools

Site Tools


tips:doc_to_wiki_syntax

How to convert docs to DokuWiki

I was just googleing a little to some conversion tools.

Hopefully i meet this :

http://www.linux.com/articles/61713

Main goal : Magic conversion in bureaucratic environment

    \*.doc -> *.html ---> *.txt ((wiki syntax))

To do this here the main schema in use :

Step 0 | Preparing the environment

Dependencies :

Code needed

three files :

  1. The main bash script : oocwiki.sh The code.
  2. The cleaning bash script : cleanfolder.sh The code.
  3. The renaming / auto loop conversion Perl script : oocwiki.pl The code.

Copy this code and create the files needed in a folder of your computer.

Folders :

Create your folder with your bunch of Ms Word files :

Ms World environment :

ENWOLRD=/home/massou/Documents/oldies/

and write on the bash script the parameters for others folders and files we need :

Temp folder :

TMPOOCWIKI=/tmp/oocwiki/

JODConverter folder ;

JODCON=/home/massou/Documents/perl/jodconverter-2.2.1/lib/jodconverter-cli-2.2.1.jar

DokuWiki transfert folder :

OUTWIKI=/srv/www/htdocs/dokuwiki/data/pages/outdoc/
OUTMEDIA=/srv/www/htdocs/dokuwiki/data/media/outdoc/

and use this bash

oocwiki.sh
oocwiki.sh
#!/bin/bash
# script oocwiki.sh
# 
# sh oocwiki.sh /home/massou/Documents/oldies/ /tmp/oociKi/
 
# This script makes a backup of my home directory.
 
# Change the values of the variables to make the script work for you:
ENWOLRD=/home/massou/Documents/oldies/
TMPOOCWIKI=/tmp/oocwiki/
JODCON=/home/massou/Documents/perl/jodconverter-2.2.1/lib/jodconverter-cli-2.2.1.jar
OUTWIKI=/srv/www/htdocs/dokuwiki/data/pages/outdoc/
OUTMEDIA=/srv/www/htdocs/dokuwiki/data/media/outdoc/
 
if [ $(whoami) != 'root' ]; then
        echo "Must be root to run $0"
        exit 1;
fi
# if [ -z $1 ]; then
#         echo "Usage: $0 </path/to/httpd.conf>"
#         exit 1
# fi
 
 
parameters=($ENWOLRD $TMPOOCWIKI $OUTWIKI $OUTMEDIA)
## is parameters ok ?
for i in ${parameters[@]}; do
	if [ ! -e "${i}" ]; then
	echo "${i} don't exist"
	mkdir ${i}
	echo "${i} resolved"
	elif [ -f "${i}" ]; then
		echo "${i} est un fichier"
 
	elif [ -d "$1" ]; then
		echo "${i} sembre prêt"
	fi
 
done
 
if [ ! -e "$JODCON" ]; then
	echo "$JODCON n'existe pas"
exit 1;
elif [ -f "$JODCON" ]; then
	echo "$JODCON is ready"
fi
 
pgrep soffice
retval=$?
if [ "$retval" = 1 ]
then
echo "soffice n'a pas l'air de fonctionner..."
soffice -headless -accept="socket,host=127.0.0.1,port=8100;urp;" -nofirststartwizard &
fi 
 
###cleaning and copy
 
parameters=($TMPOOCWIKI $OUTWIKI $OUTMEDIA)
## is parameters ok ?
for i in ${parameters[@]}; do
	if [ -e "${i}" ]; then
	echo "${i} don't exist"
	rm -R ${i}
	mkdir ${i}
	echo "${i} resolved"
	fi
done
 
 
cp -R $ENWOLRD/* $TMPOOCWIKI
 
 
################### Step1 Some cleaning ##################
sh ./cleanfolder.sh $TMPOOCWIKI
 
 
 
######################### Step 2-3 Time of perl #################
 
perl oocwiki.pl $TMPOOCWIKI $JODCON
 
######################### Step 4 Copy of the files #################
 
cp -R $TMPOOCWIKI/* $OUTWIKI
cp -R $TMPOOCWIKI/* $OUTMEDIA
 
########### Step 5 time for ACL #########
 
parameters=($OUTWIKI $OUTMEDIA)
## is parameters ok ?
for i in ${parameters[@]}; do
 
chown -R wwwrun ${i}
chgrp -R www ${i}
chmod -R 775 ${i}
 
done

Step 1 | cleaning the Ms Word environment :

 /////*.doc

Bash or Perl script for renaming folder / under folder / file name from Windows file system to more simply Unix-like syntax

cleanfolder.sh
cleanfolder.sh
#!/bin/bash
# file cleanfolder.sh 
# Convert filenames to lowercase
# and replace characters recursively
#####################################
 
if [ -z $1 ];then echo Give target directory; exit 0;fi
 
find "$1" -depth -name '*' | while read file ; do
        directory=$(dirname "$file")
        oldfilename=$(basename "$file")
        newfilename=$(echo "$oldfilename" | tr 'A-Z' 'a-z' | tr ' ' '_' | sed 's/_-_/-/g')
        if [ "$oldfilename" != "$newfilename" ]; then
                mv -i "$directory/$oldfilename" "$directory/$newfilename"
                echo ""$directory/$oldfilename" ---> "$directory/$newfilename""
                #echo "$directory"
                #echo "$oldfilename"
                #echo "$newfilename"
                #echo
        fi
        done
exit 0

Step 2 :

lower_case/whithout_blank_space.doc —> Soffice as a service + jodconverter —> *.html

oocwiki.pl
oocwiki.pl
 
#!/usr/bin/perl -w
 
$time = localtime;
print "The time is now $time\n";
 
my $TMPOOCWIKI=$ARGV[0]."\n";
my $JODCON=$ARGV[1]."\n";
print $TMPOOCWIKI."\n";
print $JODCON."\n";
$chemin = $TMPOOCWIKI;
$jod = $JODCON; 
chomp($chemin);
chomp($jod);
use File::Basename;
use File::Find;
 
 
find(\&Wanted, $chemin);
 
sub Wanted
	{
if ($File::Find::name =~ m/^$DocumentRoot(\/.*)?$/) {
   $fullname = $File::Find::name . "\n";
     ($name,$path,$suffix) = fileparse($fullname,qr{\..*});
 $suffix . "\n";
if ($suffix eq '.doc'){
# if ($suffix = "\.doc") {
 
 $name = fileparse($fullname);
    $basename = basename($fullname);
    $dir  = dirname($fullname); 
 
 
 
$base2=lc($name);
$base2 =~ tr/ /_/;
$base2 =~ tr/ÀÁÂÃÄÅàáâãäåÒÓÔÕÖØòóôõöøÈÉÊËèéêëÇçÌÍÎÏìíîïÙÚÛÜùúûüÿÑñ/aaaaaaaaaaaaooooooooooooeeeeeeeecciiiiiiiiuuuuuuuuynn/;
 
 
#Step1 renaming, again
$dir =~ s/$/\//;
$newname = $dir.$base2;
# $newname =~ s/$/\.doc/;
 
print $fullname;
print $newname;
 
print $fullname;
print $newname;
# $fullname =~ s/ /\\ /;
# $newname =~ s/ /\\ /;
chomp($fullname);
chomp($newname);
# # print $newname;
rename("$fullname", "$newname") or  
        warn "Couldn't rename $fullname to $newname: $!\n";
 
#Prepare newname for conversion
$newname2 = $newname;
$newname3 = $newname;
$newname2 =~ s/\.doc$/\.html/ ;
$newname3 =~ s/\.doc$/\.txt/ ;
# print "sortie-----$newname2\n";
 
# Subroutine to execute the command step 2 and 3
 
my $res="";
 
my $cmd="java -jar $jod  $newname $newname2|";
my $cmd2="html2wiki --dialect DokuWiki $newname2 > $newname3|";
open(EXEC,"$cmd");
 
while($res=<EXEC>){
       chomp($res);
       print "$res \n";
}
close(EXEC);
 
  open(EXEC,"$cmd2");
 
while($res=<EXEC>){
       chomp($res);
       print "$res \n";
}
close(EXEC);
 
 
}
}
 
	}

Step3 :

*.html —> HtmlWikiConverter —> *.txt

Step4 :

Finally we just copy the files to media and pages folders… enough. Perl scripting to change URL of media to point to good URL media and dispatch media and txt files in good place on the server…

Step5 :

Fix permissions.

Command lines in use

First you need OpenOffice.org on a Linux box.

go to a terminal and execute this :

soffice -headless -accept="socket,port=8100;urp;"

http://www.artofsolving.com/node/10

(dont forget cli :!=à=)

java -jar jodconverter-cli-2.2.1.jar A.doc A.pdf
java -jar jodconverter-cli-2.2.1.jar A.doc A.html

http://search.cpan.org/src/DIBERRI/HTML-WikiConverter-0.61/README

massou@linux-hj6y:~/Documents/momas/jodconverter-2.2.1/lib> html2wiki --dialect DokuWiki A.html > output.mw
tips/doc_to_wiki_syntax.txt · Last modified: 2017-05-18 13:22 by 82.142.128.18

Except where otherwise noted, content on this wiki is licensed under the following license: CC Attribution-Share Alike 4.0 International
CC Attribution-Share Alike 4.0 International Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki