Differences

This shows you the differences between two versions of the page.

--- tips:pdfexport:htmldoc [2008-10-20 15:38] – corrected layout problem wernerflamme
+++ tips:pdfexport:htmldoc [2023-03-08 08:21] (current) – 2409:4070:4387:9f60:5d3e:e9b4:db59:c826
@@ Line 2: / Line 2: @@
 [[http://www.htmldoc.org/|HTMLDOC]] is a free, high quality  HTML to PDF converter. The only drawback is, that it doesn't support CSS in its current version. (You can still achieve good results!) The big advantage is, that you don't need to install something else. (no ghostscript e.g.) \\ To use htmldoc in your wiki do the following:
   * Install htmldoc (pretty easy)
-  * Add the //Export to PDF// button as described above.
+  * Add the //Export to PDF// button as described in [[tips:pdfexport#common_changes]].
   * Create a temporary directory that the webserver can write to for the intermediate step.
-  * In the function ''act_export'', in ''inc/actions.php'', add this:<code php>
+  * In the function ''act_export'', in ''inc/actions.php'', add this (just after the "global" lines):<code php>
   if($act == 'export_pdf'){
     pdfmake(  p_wiki_xhtml($ID,$REV,false)  );
@@ Line 150: / Line 150: @@
 header("Content-Disposition: attachment; filename=wikiexport" . str_replace(':','_',$_GET["id"]) . ".pdf");
 </code>
-To retrieve images from the wiki server (relative links, hope that it won't cause security issues) (I had problems with PNG files, so I converted them into JPEG format).
+To retrieve images from the wiki server (relative links, hope that it won't cause security issues) (I had problems with PNG files, so I converted them into JPEG format
 <code xml>
 $text = preg_replace("'<img src=\"/(.*?)/lib/exe/fetch.php(.*?)media=(.*?)\"(.*?)>'si","<img src=\"http://" . $_SERVER['SERVER_NAME'] . "/\\1/data/media/\\3\">", $text); # for uploaded images
@@ Line 520: / Line 520: @@
 header("Content-Disposition: attachment; filename=".str_replace(' ','_',$conf['title']).'-'.end(split('/',$_GET["id"])).".pdf");
 </code>
+====== HTMLDOC recursive variant ======
+My problem was that i needed support for child page export. It therefore choose to modify / hack [[#An_HTMLDOC_variant]] found on this page. Some of the remarks / improvements to [[#An_HTMLDOC_variant]] have also been included.
+It will thus perform a recursive export of your current page. This means that any internal links will be followed and converted to PDF too. The internal links should copied to the PDF - meaning that they are click-able like they are in dokuwiki.
+  * Follow the first steps of [[#An_HTMLDOC_variant]] (on this page)
+  * Then insert this into "inc/common.php":<code php>
+function pdfmake($text)
+{
+  //Variables used to stop the search for child pages
+  global $pdfmake_recursion_level;
+  global $pdfmake_recursion_current;
+  global $pdfmake_links;
+  $pdfmake_links = array();
+  $pdfmake_recursion_level = 30;
+  $pdfmake_recursion_current = 0;
+	// Now search for children.
+	$text = pdfmake_children($text);
+	// And create the pdf
+	pdfmake_inner($text);
+}
+function pdfmake_inner($text){
+  global $lang;
+  global $conf;
+  $dir=DOKU_INC."tmp/";
+  $filenameInput=$dir."input.html";
+  $filenameOutput=$dir."output.pdf";
+# Convert text and toctitle to destination code-page
+  $text=iconv("utf-8",$conf['pdfcp'].'//TRANSLIT',$text);
+# Change toctitle if needed
+  if ($conf['customtoc']) {
+    $toctitle=$conf['customtoc'];
+    }
+  elseif ($conf['uselangtoc']) {
+    $toctitle=$lang['toc'];
+    }
+  else {
+    $toctitle="Table of contents";
+  }
+  $toctitle=iconv("utf-8",$conf['pdfcp'],$toctitle);
+# htmldoc compatible name-conversion
+  $pdfcp=preg_replace("/windows/i","cp",$conf['pdfcp']);
+  $text = preg_replace("'<div class=\"toc\"><div class=\"tocheader\">.*?</div></div>'si",'',$text );
+  $text = preg_replace("'<a[^>]*?></a>'si", '', $text );
+# Execute changes based on replaces.conf
+  $replacesf=DOKU_INC . "conf/replaces.conf";
+  if ($conf['usecustomreplace'] && file_exists($replacesf)) {
+    $allreplaces=file_get_contents($replacesf);
+# Delete comments from file
+    $allreplaces=preg_replace("'(//.*|\s+#.*|^#.*)'",'',$allreplaces);
+# Legalize multiple white-spaces
+    $allreplaces=preg_replace("'(\t+| +)'",' ',$allreplaces);
+# Delete unwanted spaces
+    $allreplaces=preg_replace("'(^ +| +$)'",'',$allreplaces);
+# Delete multiple empty lines
+    $allreplaces=preg_replace("'\n+'","\n",$allreplaces);
+# Split codepage sections
+    $codepages=preg_split("'\n@'",$allreplaces,-1, PREG_SPLIT_NO_EMPTY);
+    $cpreg=preg_quote($conf['pdfcp']);
+# Find the used codepage
+    foreach ($codepages as $codepage) {
+      if (preg_match("'" . $cpreg . "'si",$codepage)) {
+        $replaces=preg_replace("'" . $cpreg . "\n'si",'',$codepage);
+        break;
+      }
+    }
+# Split patterns
+    $patterns=preg_split("'\n'",$replaces,-1, PREG_SPLIT_NO_EMPTY);
+    foreach ($patterns as $onepair) {
+# Split pairs
+      $pairarray=preg_split("' '",$onepair);
+# Make changes
+      $text=str_replace($pairarray[0],$pairarray[1],$text);
+    }
+  }
+  $text = preg_replace("'<img src=\"/(.*?)/lib/exe/fetch.php(.*?)media=(.*?)\"(.*?)>'si","<img src=\"http://" . $_SERVER['SERVER_NAME'] . "/\\1/lib/exe/fetch.php?media=\\3\">", $text); # for uploaded images
+  $text = preg_replace("'<img src=\"/(.*?)>'si","<img src=\"http://" . $_SERVER['SERVER_NAME'] . "/\\1>", $text); # for built-in images, smileys for example
+  $text = str_replace("href=\"/doku.php?id=", "href=\"#", $text); //correct internal links
+  $textarr = preg_split("/\n/",$text);
+# Find and change linked images
+  $linkeds = preg_grep("'<a href=.*<img src=.* /></a>'i",$textarr);
+  foreach ( $linkeds as $linked ) {
+    $picture = preg_replace("/<a href=.*\">/i",'',$linked);
+    $picture = preg_replace("'</a>'i",'',$picture);
+    $found = "'".preg_quote($linked)."'";
+    $text = preg_replace($found,$picture,$text);
+  }
+# HTML compatibility -> htmldoc can use <br> instead of <br/>
+  $text = str_replace('/>','>',$text);
+  $text = str_replace('<table', '<table border="1" ', $text);
+#write the string to temporary html-file
+  $fp = fopen ($filenameInput, "w") or die ("cant create file");
+  fwrite($fp,$text);
+  fclose($fp);
+#Use embedded fonts if needed
+  if ($conf['embedfonts']) {
+    $fontparam='--embedfonts';
+  } else {
+    $fontparam='';
+  }
+#JPEG compression rate settings
+  $jpeg=" --jpeg=" . $conf['jpgrate'];
+#PDF compatibility
+  $pdf="-t " . $conf['pdfversion'];
+#Documentwidth
+  $width=" --browserwidth " . $conf['browserwidth'];
+#convert using htmldoc
+  $command = $conf['htmldocdir'] . "htmldoc " . $pdf . $width . $jpeg . " --charset ". $pdfcp  . " --no-title " . $fontparam . " --toctitle \"" . $toctitle . "\" -f " . $filenameOutput . " " . $filenameInput;
+  system($command);
+  system("exit(0)");
+#send to browser
+  $filenameOutput=trim($filenameOutput);
+  header("Content-type: application/pdf");
+  header("Content-Disposition: attachment; filename=dokuwikiexport_" . str_replace(':','_',$_GET["id"]) . ".pdf");
+  $fd = @fopen($filenameOutput,"r");
+  //Puke on error
+  if($fd == false)
+  {
+    print 'Output file cannot be opened';
+    exit;
+  }
+  while(!feof($fd)){
+    echo fread($fd,2048);
+  }
+  fclose($fd);
+#clean up temporary files
+  system("rm " . $filenameInput);
+  system("rm " . $filenameOutput);
+}
+//search for child pages and render their html
+function pdfmake_children($text)
+{
+  //Extract recusion levels
+  global $pdfmake_recursion_level;
+  global $pdfmake_recursion_current;
+  global $pdfmake_links;
+  $links = array();
+ 	$pdfmake_recursion_current += 1;
+ 	//echo 'Current recursion level: ', $pdfmake_recursion_current, '<br>';
+ 	//will contain all subpages at the end.
+ 	$innerText = '';
+	//find all links on page
+	$regex_pattern = "/<a href=\"(.*)\">(.*)<\/a>/";
+	preg_match_all($regex_pattern,$text,$matches);
+  //The matching pairs will be listed in matches[1]. Sort these matches, so that subnamspaces comes before their parent namespaces.
+  //sort($matches[1]);
+	for($i=0; $i< count($matches[1]); $i++) {
+    //extract the internal dokuwiki id of the subpage. This is needed to perform the rendering
+    $link = substr($matches[1][$i], stripos($matches[1][$i],'title=')+7);
+   // echo $link, '<br>';
+    //Dont add a page which has already been included
+    if(!in_array($link, $pdfmake_links)) {
+   	  // Call the dokuwiki renderer, if the link does not start with http (then it is not an internal link)
+   	  if(substr($link, 0, 4) != 'http') {
+	      $innerText .= p_wiki_xhtml($link,'',false);
+        //Add the link to the collection so it can be sanitized later.
+        $pdfmake_links[] = $link;
+        $links[] = $link;
+	    }
+	  }
+	}
+	//Recurse into the next level of internal links
+  if($pdfmake_recursion_current < $pdfmake_recursion_level) {
+      //echo "inside recursion<br>";
+    $innerText = pdfmake_children($innerText);
+  }
+  $text = pdfmake_correctlinks($text, $links);
+  $innerText = pdfmake_correctlinks($innerText, $links);
+  // return all the text to caller.
+  return $text.$innerText;
+}
+function pdfmake_correctlinks($text, $links)
+{
+  for($i = 0; $i < count($links); $i++) {
+    $link = $links[$i];
+    // this $link is the full path to the dokuwiki page. However, in the HTML output, it is only the name after the last ":" which is inserted as id for a heading.
+    $text = str_replace($link, substr($link, strrpos($link, ':')+1), $text);
+  }
+  return $text;
+}
+</code>
+Remember that I only tested this on my own sever (on which it works). So expect bugs and / or strange behavior.
+===== Bug fixes =====
+Here follows a list of fixed bugs
+  * 2009-10-31:
+    * Fixed a bug in the command line which, on some pages, caused the PDF generation to fail.
+    * Fixed a bug with unconvertable UTF8 chars breaking pdf generation (chars like -> and <-)
+ --- //[[nicklas.overgaard@gmail.com| Nicklas Overgaard]] 2009-10-31 16:45 GMT+1 //
 ====== HTMLDOC and OS X ======
@@ Line 564: / Line 790: @@
   $filenameOutput=tempnam('','pdf');
 </code>
 ====== HTMLDOC request ======
 I think that will be very useful if you can create a page with the list of wiki page to export and HTMLDOC export all these pages into a PDF file.\\
@@ Line 578: / Line 803: @@
 So you can create pages from which you can extract a PDF file based on more wiki pages
+**Check the** [[#HTMLDOC_recursive_variant]] it should support the requested feature.
 ===== Config problem with HTMLDOC variant =====
@@ Line 587: / Line 814: @@
 you have to declare all value in your ''config.metadata.php''
+===== Changes to the TOC =====
+Some recent changes in the core will break all the TOC-related code above, because [[https://github.com/dokuwiki/dokuwiki/commit/d5acc30de20298eb6ed7545e70484599c4d95867|the HTML for the TOC has been rewritten]]. The changes will be part of DokuWiki from the next release on (autumn 2012).