DokuWiki

It's better when it's simple

User Tools

Site Tools


plugin:bomfix

BOMfix Syntax PlugIn

Compatible with DokuWiki

2005-07-13+

plugin Suppress UTF-8 Byte-Order-Mark

Last updated on
2008-11-16
Provides
Syntax, Render

This extension has not been updated in over 2 years. It may no longer be maintained or supported and may have compatibility issues.

Tagged with bom, openoffice, utf-8

If you always edit your wiki pages with Doku­Wiki's built-in edi­tor (i.e. the HTML form ba­sed edit op­tion) you won't need this plugin at all.

External editors (i.e. separate stand­alone pro­grams like word ­pro­ce­ssing soft­ware) usu­al­ly mark a file in UTF8 for­mat by pre­pen­ding its con­tent with a “magic” byte se­quence1) at the very start of file. While there is no harm in it as far as Doku­Wiki is con­cer­ned those “ma­gic” bytes do ap­pear in the page pre­sen­ted to the user.

Depending on a page's actual content and the respec­tive CSS rules in ef­fect this may lead to un­de­sired re­sults. One way to get rid of this pro­blem would be to open the affec­ted page(s) with Doku­Wiki's built-in edit fea­ture and simply re­move those bytes. How­ever, such an ap­proach would cause the word­ pro­ces­sor to open the file as plain text assu­ming it's in ASCII or, say, ISO-8859-1 for­mat – whatever may be confi­gu­red as its de­fault text format. That, in conse­quence would in­va­li­date (or at least ren­der strange­ly) all UTF8 cha­rac­ter se­quen­ces.

Actually that is the recommended approach if (i.e. if) you ne­ver in­tend to edit the wiki pages by an ex­ter­nal edi­tor.

As it happens, personally I prefer to edit the pages (of a local Doku­Wiki in­stal­la­tion) by edi­tors like Kate or OpenOffice.org for various rea­sons2). There­for I3) need those “magic” bytes but I don't want them to show up in the pa­ges pre­sen­ted to the end user (reader). Enter syntax_plugin_bomfix.

Usage

The whole purpose of this plugin is to suppress the out­put of that “magic” byte sequence. And no­thing more.

:!: There are no new wiki language features intro­duced by this plugin. Nor is there any­thing special you have to remem­ber when edi­ting one of your al­ready existing or newly crea­ted pages. Hence – besi­des in­stal­ling this plugin there's no­thing to do or respect.

Installation

It's quite easy to integrate this plugin with your DokuWiki:

  1. Download the source archive (~3KB) and un­pack it in your Doku­Wiki plug­in di­rec­tory {dokuwiki}/lib/plugins (make sure, in­clu­ded sub­di­rec­to­ries are un­packed cor­rectly); this will create the directory {dokuwiki}/lib/plugins/bomfix.
  2. Make sure both the new direc­tory and the files therein are read­able by the web-server e.g.
    	chown apache:apache dokuwiki/lib/plugins/* -Rc

You might as well use the plugin manager for installing or updating this plugin.

Plugin Source

Here comes the GPLed PHP source4) for those who'd like to scan be­fore actu­ally in­stal­ling it:

<?php
if (! class_exists('syntax_plugin_bomfix')) {
  if (! defined('DOKU_PLUGIN')) {
    if (! defined('DOKU_INC')) {
      define('DOKU_INC', realpath(dirname(__FILE__) . '/../../') . '/');
    } // if
    define('DOKU_PLUGIN', DOKU_INC . 'lib/plugins/');
  } // if
  // Include parent class:
  require_once(DOKU_PLUGIN . 'syntax.php');
 
/**
 * <tt>syntax_plugin_bomfix.php </tt>- A PHP4 class that implements
 * a <tt>DokuWiki</tt> plugin for <tt>UTF8 "magic" bytes</tt>.
 *
 * <p>
 * External editors (i.e. separate standalone programs like wordprocessing
 * software) usually mark a file in UTF8 format by prepending its content
 * with a "magic" byte sequence at the very start of file. While there is
 * no harm in it as far as DokuWiki is concerned those "magic" bytes
 * (Byte Order Mark) <em>do</em> appear in the page presented to the user.
 * </p><p>
 * Depending on a page's actual content and the respective CSS rules in
 * effect this may lead to undesired results. One way to get rid of this
 * problem would be to open the affected page(s) with DokuWiki's builtin
 * edit feature and simply remove those bytes. However, such an approach
 * would cause the wordprocessor to open the file as plain text assuming
 * it's in ASCII or, say, ISO-8859-1 format - whatever may be configured
 * as the default text format. That, in consequence, would invalidate (or
 * at least render strangely) all UTF8 character sequences.
 * </p><p>
 * Actually that is the recommended approach <em>if</em> (i.e. <tt>if</tt>)
 * you never intend to edit the wiki pages by an external editor.
 * </p><p>
 * As it happens, personally I prefer to edit the pages (of a local DokuWiki
 * installation) by OpenOffice.org for various reasons. (And, yes, I know
 * that I bypass DokuWiki's changes-system this way.) Therefor I need those
 * "magic" bytes <em>but</em> I don't want them to show up in the pages
 * presented to the end user (reader). Enter <tt>syntax_plugin_bomfix</tt>.
 * The whole purpose of this plugin is to suppress the output of that
 * "magic" byte sequence. And nothing more. There are no new wiki language
 * features introduced by this plugin. Nor is there anything special you
 * have to remember when editing one of your already existing or newly
 * created pages.
 * </p><p>
 * To use it just install the plugin in your DokuWiki's plugin folder.
 * That's all.
 * </p><pre>
 *  Copyright (C) 2006, 2008  M.Watermann, D-10247 Berlin, FRG
 *      All rights reserved
 *    EMail : &lt;support@mwat.de&gt;
 * </pre>
 * <div class="disclaimer">
 * This program is free software; you can redistribute it and/or modify
 * it under the terms of the GNU General Public License as published by
 * the Free Software Foundation; either
 * <a href="http://www.gnu.org/licenses/gpl.html">version 3</a> of the
 * License, or (at your option) any later version.<br>
 * This software is distributed in the hope that it will be useful,
 * but WITHOUT ANY WARRANTY; without even the implied warranty of
 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
 * General Public License for more details.
 * </div>
 * @author <a href="mailto:support@mwat.de">Matthias Watermann</a>
 * @version <tt>$Id: syntax_plugin_bomfix.php,v 1.5 2008/11/16 13:21:55 matthias Exp $</tt>
 * @since created 24-Dec-2006
 */
class syntax_plugin_bomfix extends DokuWiki_Syntax_Plugin {
 
  /**
   * @publicsection
   */
  //@{
 
  /**
   * Tell the parser whether the plugin accepts syntax mode
   * <tt>$aMode</tt> within its own markup.
   *
   * @param $aMode String The requested syntaxmode.
   * @return Boolean <tt>FALSE</tt> always since no nested markup
   * is possible with this plugin.
   * @public
   */
  function accepts($aMode) {
    return FALSE;
  } // accepts()
 
  /**
   * Connect lookup pattern to lexer.
   *
   * @param $aMode String The desired rendermode.
   * @public
   * @see render()
   */
  function connectTo($aMode) {
    $this->Lexer->addSpecialPattern('^\xEF\xBB\xBF',
      $aMode, 'plugin_bomfix');
  } // connectTo()
 
  /**
   * Get an associative array with plugin info.
   *
   * <p>
   * The returned array holds the following fields:
   * <dl>
   * <dt>author</dt><dd>Author of the plugin</dd>
   * <dt>email</dt><dd>Email address to contact the author</dd>
   * <dt>date</dt><dd>Last modified date of the plugin in
   * <tt>YYYY-MM-DD</tt> format</dd>
   * <dt>name</dt><dd>Name of the plugin</dd>
   * <dt>desc</dt><dd>Short description of the plugin (Text only)</dd>
   * <dt>url</dt><dd>Website with more information on the plugin
   * (eg. syntax description)</dd>
   * </dl>
   * @return Array Information about this plugin class.
   * @public
   * @static
   */
  function getInfo() {
    return array(
      'author' =>  'Matthias Watermann',
      'email' =>  'support@mwat.de',
      'date' =>  '2008-11-16',
      'name' =>  'BOMfix Syntax Plugin',
      'desc' =>  'Ignore UTF8 "magic" bytes at start of page',
      'url' =>  'http://www.dokuwiki.org/plugin:bomfix');
  } // getInfo()
 
  /**
   * Where to sort in?
   *
   * @return Integer <tt>380</tt> (doesn't really matter).
   * @static
   * @public
   */
  function getSort() {
    return 380;
  } // getSort()
 
  /**
   * Get the type of syntax this plugin defines.
   *
   * @return String <tt>'substition'</tt> (i.e. 'substitution').
   * @static
   * @public
   */
  function getType() {
    return 'substition';  // sic! should be __substitution__
  } // getType()
 
  /**
   * Handler to prepare matched data for the rendering process.
   *
   * <p>
   * The <tt>$aState</tt> parameter gives the type of pattern
   * which triggered the call to this method:
   * </p>
   * <dl>
   * <dt>DOKU_LEXER_ENTER</dt>
   * <dd>a pattern set by <tt>addEntryPattern()</tt></dd>
   * <dt>DOKU_LEXER_MATCHED</dt>
   * <dd>a pattern set by <tt>addPattern()</tt></dd>
   * <dt>DOKU_LEXER_EXIT</dt>
   * <dd> a pattern set by <tt>addExitPattern()</tt></dd>
   * <dt>DOKU_LEXER_SPECIAL</dt>
   * <dd>a pattern set by <tt>addSpecialPattern()</tt></dd>
   * <dt>DOKU_LEXER_UNMATCHED</dt>
   * <dd>ordinary text encountered within the plugin's syntax mode
   * which doesn't match any pattern.</dd>
   * </dl><p>
   * This implementation does nothing (ignoring the passed arguments)
   * and just returns the given <tt>$aState</tt>.
   * </p>
   * @param $aMatch String The text matched by the patterns.
   * @param $aState Integer The lexer state for the match.
   * @param $aPos Integer The character position of the matched text.
   * @param $aHandler Object Reference to the Doku_Handler object.
   * @return Integer The current lexer state.
   * @public
   * @see render()
   * @static
   */
  function handle($aMatch, $aState, $aPos, &$aHandler) {
    return $aState;  // doesn't really matter as it's ignored anyway ...
  } // handle()
 
  /**
   * Handle the actual output creation.
   *
   * <p>
   * The method checks for the given <tt>$aFormat</tt> and returns
   * <tt>FALSE</tt> when a format isn't supported.
   * <tt>$aRenderer</tt> contains a reference to the renderer object
   * which is currently handling the rendering.
   * The contents of <tt>$aData</tt> is the return value of the
   * <tt>handle()</tt> method.
   * </p><p>
   * Besides "eating" the BOM implicitely this implementation does
   * nothing (ignoring all passed arguments) and always returns
   * <tt>TRUE</tt>.
   * </p>
   * @param $aFormat String The output format to generate.
   * @param $aRenderer Object A reference to the renderer object.
   * @param $aData Integer The data created/returned by the
   * <tt>handle()</tt> method.
   * @return Boolean <tt>TRUE</tt> always since there's no actual
   * rendering done and hence can't ever fail.
   * @public
   * @see handle()
   * @static
   */
  function render($aFormat, &$aRenderer, $aData) {
    // nothing to do here - just 'eat' the BOM
    return TRUE;
  } // render()
 
  //@}
} // class syntax_plugin_bomfix
} // if
//Setup VIM: ex: et ts=2 enc=utf-8 :
?>

Changes

2008-11-16:
2008-10-29:
* minor doc corrections;

2007-08-15:
* added GPL link and fixed some doc problems;

2007-12-26:
+ initial release;

Matthias Watermann 2008-11-16

See also

Plugins by the same author

Discussion

Hints, comments, suggestions …


1)
often called BOM: Byte Order Mark
2)
and, yes, I know that I bypass Doku­Wiki's locking and chan­ges-sy­stem this way; but I know what, when and how I'm doing it…
3)
i.e. the edi­tor
4)
The comments within the source file are suit­able for the OSS doxygen tool, a do­cu­men­ta­tion sy­stem for C++, C, Java, Ob­jec­tive-C, Py­thon, IDL and to some ex­tent PHP, C#, and D. — Since I'm wor­king with dif­fe­rent pro­gram­ming lan­gua­ges it's a great ease to have one tool that handles the docs for all of them.
5)
obsoleted by incorporating its ability into the Code plugin
plugin/bomfix.txt · Last modified: 2011-06-18 14:24 by ach