DokuWiki

It's better when it's simple

User Tools

Site Tools


plugin:bomfix

BOMfix Syntax PlugIn

Compatible with DokuWiki

2005-07-13+

plugin Suppress UTF-8 Byte-Order-Mark

Last updated on
2008-11-16
Provides
Syntax, Render

This extension has not been updated in over 2 years. It may no longer be maintained or supported and may have compatibility issues.

Tagged with bom, openoffice, utf-8

If you always edit your wiki pages with Doku­Wiki's built-in edi­tor (i.e. the HTML form ba­sed edit op­tion) you won't need this plugin at all.

External editors (i.e. separate stand­alone pro­grams like word ­pro­ce­ssing soft­ware) usu­al­ly mark a file in UTF8 for­mat by pre­pen­ding its con­tent with a “magic” byte se­quence1) at the very start of file. While there is no harm in it as far as Doku­Wiki is con­cer­ned those “ma­gic” bytes do ap­pear in the page pre­sen­ted to the user.

Depending on a page's actual content and the respec­tive CSS rules in ef­fect this may lead to un­de­sired re­sults. One way to get rid of this pro­blem would be to open the affec­ted page(s) with Doku­Wiki's built-in edit fea­ture and simply re­move those bytes. How­ever, such an ap­proach would cause the word­ pro­ces­sor to open the file as plain text assu­ming it's in ASCII or, say, ISO-8859-1 for­mat – whatever may be confi­gu­red as its de­fault text format. That, in conse­quence would in­va­li­date (or at least ren­der strange­ly) all UTF8 cha­rac­ter se­quen­ces.

Actually that is the recommended approach if (i.e. if) you ne­ver in­tend to edit the wiki pages by an ex­ter­nal edi­tor.

As it happens, personally I prefer to edit the pages (of a local Doku­Wiki in­stal­la­tion) by edi­tors like Kate or OpenOffice.org for various rea­sons2). There­for I3) need those “magic” bytes but I don't want them to show up in the pa­ges pre­sen­ted to the end user (reader). Enter syntax_plugin_bomfix.

Usage

The whole purpose of this plugin is to suppress the out­put of that “magic” byte sequence. And no­thing more.

:!: There are no new wiki language features intro­duced by this plugin. Nor is there any­thing special you have to remem­ber when edi­ting one of your al­ready existing or newly crea­ted pages. Hence – besi­des in­stal­ling this plugin there's no­thing to do or respect.

Installation

Search and install the plugin using the Extension Manager.

Alternatively, refer to Plugins on how to install plugins manually. It's quite easy to integrate this plugin with your DokuWiki:

  1. Download the source archive (~3KB) and un­pack it in your Doku­Wiki plug­in di­rec­tory {dokuwiki}/lib/plugins (make sure, in­clu­ded sub­di­rec­to­ries are un­packed cor­rectly); this will create the directory {dokuwiki}/lib/plugins/bomfix.
  2. Make sure both the new direc­tory and the files therein are read­able by the web-server e.g.
    	chown apache:apache dokuwiki/lib/plugins/* -Rc

Plugin Source

Here comes the GPLed PHP source4) for those who'd like to scan be­fore actu­ally in­stal­ling it:

<?php
if (! class_exists('syntax_plugin_bomfix')) {
  if (! defined('DOKU_PLUGIN')) {
    if (! defined('DOKU_INC')) {
      define('DOKU_INC', realpath(dirname(__FILE__) . '/../../') . '/');
    } // if
    define('DOKU_PLUGIN', DOKU_INC . 'lib/plugins/');
  } // if
  // Include parent class:
  require_once(DOKU_PLUGIN . 'syntax.php');
 
/**
 * <tt>syntax_plugin_bomfix.php </tt>- A PHP4 class that implements
 * a <tt>DokuWiki</tt> plugin for <tt>UTF8 "magic" bytes</tt>.
 *
 * <p>
 * External editors (i.e. separate standalone programs like wordprocessing
 * software) usually mark a file in UTF8 format by prepending its content
 * with a "magic" byte sequence at the very start of file. While there is
 * no harm in it as far as DokuWiki is concerned those "magic" bytes
 * (Byte Order Mark) <em>do</em> appear in the page presented to the user.
 * </p><p>
 * Depending on a page's actual content and the respective CSS rules in
 * effect this may lead to undesired results. One way to get rid of this
 * problem would be to open the affected page(s) with DokuWiki's builtin
 * edit feature and simply remove those bytes. However, such an approach
 * would cause the wordprocessor to open the file as plain text assuming
 * it's in ASCII or, say, ISO-8859-1 format - whatever may be configured
 * as the default text format. That, in consequence, would invalidate (or
 * at least render strangely) all UTF8 character sequences.
 * </p><p>
 * Actually that is the recommended approach <em>if</em> (i.e. <tt>if</tt>)
 * you never intend to edit the wiki pages by an external editor.
 * </p><p>
 * As it happens, personally I prefer to edit the pages (of a local DokuWiki
 * installation) by OpenOffice.org for various reasons. (And, yes, I know
 * that I bypass DokuWiki's changes-system this way.) Therefor I need those
 * "magic" bytes <em>but</em> I don't want them to show up in the pages
 * presented to the end user (reader). Enter <tt>syntax_plugin_bomfix</tt>.
 * The whole purpose of this plugin is to suppress the output of that
 * "magic" byte sequence. And nothing more. There are no new wiki language
 * features introduced by this plugin. Nor is there anything special you
 * have to remember when editing one of your already existing or newly
 * created pages.
 * </p><p>
 * To use it just install the plugin in your DokuWiki's plugin folder.
 * That's all.
 * </p><pre>
 *  Copyright (C) 2006, 2008  M.Watermann, D-10247 Berlin, FRG
 *      All rights reserved
 *    EMail : &lt;support@mwat.de&gt;
 * </pre>
 * <div class="disclaimer">
 * This program is free software; you can redistribute it and/or modify
 * it under the terms of the GNU General Public License as published by
 * the Free Software Foundation; either
 * <a href="http://www.gnu.org/licenses/gpl.html">version 3</a> of the
 * License, or (at your option) any later version.<br>
 * This software is distributed in the hope that it will be useful,
 * but WITHOUT ANY WARRANTY; without even the implied warranty of
 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
 * General Public License for more details.
 * </div>
 * @author <a href="mailto:support@mwat.de">Matthias Watermann</a>
 * @version <tt>$Id: syntax_plugin_bomfix.php,v 1.5 2008/11/16 13:21:55 matthias Exp $</tt>
 * @since created 24-Dec-2006
 */
class syntax_plugin_bomfix extends DokuWiki_Syntax_Plugin {
 
  /**
   * @publicsection
   */
  //@{
 
  /**
   * Tell the parser whether the plugin accepts syntax mode
   * <tt>$aMode</tt> within its own markup.
   *
   * @param $aMode String The requested syntaxmode.
   * @return Boolean <tt>FALSE</tt> always since no nested markup
   * is possible with this plugin.
   * @public
   */
  function accepts($aMode) {
    return FALSE;
  } // accepts()
 
  /**
   * Connect lookup pattern to lexer.
   *
   * @param $aMode String The desired rendermode.
   * @public
   * @see render()
   */
  function connectTo($aMode) {
    $this->Lexer->addSpecialPattern('^\xEF\xBB\xBF',
      $aMode, 'plugin_bomfix');
  } // connectTo()
 
  /**
   * Get an associative array with plugin info.
   *
   * <p>
   * The returned array holds the following fields:
   * <dl>
   * <dt>author</dt><dd>Author of the plugin</dd>
   * <dt>email</dt><dd>Email address to contact the author</dd>
   * <dt>date</dt><dd>Last modified date of the plugin in
   * <tt>YYYY-MM-DD</tt> format</dd>
   * <dt>name</dt><dd>Name of the plugin</dd>
   * <dt>desc</dt><dd>Short description of the plugin (Text only)</dd>
   * <dt>url</dt><dd>Website with more information on the plugin
   * (eg. syntax description)</dd>
   * </dl>
   * @return Array Information about this plugin class.
   * @public
   * @static
   */
  function getInfo() {
    return array(
      'author' =>  'Matthias Watermann',
      'email' =>  'support@mwat.de',
      'date' =>  '2008-11-16',
      'name' =>  'BOMfix Syntax Plugin',
      'desc' =>  'Ignore UTF8 "magic" bytes at start of page',
      'url' =>  'http://www.dokuwiki.org/plugin:bomfix');
  } // getInfo()
 
  /**
   * Where to sort in?
   *
   * @return Integer <tt>380</tt> (doesn't really matter).
   * @static
   * @public
   */
  function getSort() {
    return 380;
  } // getSort()
 
  /**
   * Get the type of syntax this plugin defines.
   *
   * @return String <tt>'substition'</tt> (i.e. 'substitution').
   * @static
   * @public
   */
  function getType() {
    return 'substition';  // sic! should be __substitution__
  } // getType()
 
  /**
   * Handler to prepare matched data for the rendering process.
   *
   * <p>
   * The <tt>$aState</tt> parameter gives the type of pattern
   * which triggered the call to this method:
   * </p>
   * <dl>
   * <dt>DOKU_LEXER_ENTER</dt>
   * <dd>a pattern set by <tt>addEntryPattern()</tt></dd>
   * <dt>DOKU_LEXER_MATCHED</dt>
   * <dd>a pattern set by <tt>addPattern()</tt></dd>
   * <dt>DOKU_LEXER_EXIT</dt>
   * <dd> a pattern set by <tt>addExitPattern()</tt></dd>
   * <dt>DOKU_LEXER_SPECIAL</dt>
   * <dd>a pattern set by <tt>addSpecialPattern()</tt></dd>
   * <dt>DOKU_LEXER_UNMATCHED</dt>
   * <dd>ordinary text encountered within the plugin's syntax mode
   * which doesn't match any pattern.</dd>
   * </dl><p>
   * This implementation does nothing (ignoring the passed arguments)
   * and just returns the given <tt>$aState</tt>.
   * </p>
   * @param $aMatch String The text matched by the patterns.
   * @param $aState Integer The lexer state for the match.
   * @param $aPos Integer The character position of the matched text.
   * @param $aHandler Object Reference to the Doku_Handler object.
   * @return Integer The current lexer state.
   * @public
   * @see render()
   * @static
   */
  function handle($aMatch, $aState, $aPos, &$aHandler) {
    return $aState;  // doesn't really matter as it's ignored anyway ...
  } // handle()
 
  /**
   * Handle the actual output creation.
   *
   * <p>
   * The method checks for the given <tt>$aFormat</tt> and returns
   * <tt>FALSE</tt> when a format isn't supported.
   * <tt>$aRenderer</tt> contains a reference to the renderer object
   * which is currently handling the rendering.
   * The contents of <tt>$aData</tt> is the return value of the
   * <tt>handle()</tt> method.
   * </p><p>
   * Besides "eating" the BOM implicitely this implementation does
   * nothing (ignoring all passed arguments) and always returns
   * <tt>TRUE</tt>.
   * </p>
   * @param $aFormat String The output format to generate.
   * @param $aRenderer Object A reference to the renderer object.
   * @param $aData Integer The data created/returned by the
   * <tt>handle()</tt> method.
   * @return Boolean <tt>TRUE</tt> always since there's no actual
   * rendering done and hence can't ever fail.
   * @public
   * @see handle()
   * @static
   */
  function render($aFormat, &$aRenderer, $aData) {
    // nothing to do here - just 'eat' the BOM
    return TRUE;
  } // render()
 
  //@}
} // class syntax_plugin_bomfix
} // if
//Setup VIM: ex: et ts=2 enc=utf-8 :
?>

Changes

2008-11-16:
2008-10-29:
* minor doc corrections;

2007-08-15:
* added GPL link and fixed some doc problems;

2007-12-26:
+ initial release;

Matthias Watermann 2008-11-16

See also

Plugins by the same author

Discussion

Hints, comments, suggestions …


1)
often called BOM: Byte Order Mark
2)
and, yes, I know that I bypass Doku­Wiki's locking and chan­ges-sy­stem this way; but I know what, when and how I'm doing it…
3)
i.e. the edi­tor
4)
The comments within the source file are suit­able for the OSS doxygen tool, a do­cu­men­ta­tion sy­stem for C++, C, Java, Ob­jec­tive-C, Py­thon, IDL and to some ex­tent PHP, C#, and D. — Since I'm wor­king with dif­fe­rent pro­gram­ming lan­gua­ges it's a great ease to have one tool that handles the docs for all of them.
5)
obsoleted by incorporating its ability into the Code plugin
plugin/bomfix.txt · Last modified: 2018-05-30 20:57 by Klap-in