BOMfix Syntax PlugIn

bomfix plugin by Matthias Watermann
Suppress UTF-8 Byte-Order-Mark

Last updated on 2008-11-16. Provides Syntax, Render.
Compatible with DokuWiki 2005-07-13+.

Tagged with bom, openoffice, utf-8.

    If you always edit your wiki pages with Doku­Wiki's built-in edi­tor (i.e. the HTML form ba­sed edit op­tion) you won't need this plugin at all.

    External editors (i.e. separate stand­alone pro­grams like word ­pro­ce­ssing soft­ware) usu­al­ly mark a file in UTF8 for­mat by pre­pen­ding its con­tent with a “magic” byte se­quence1) at the very start of file. While there is no harm in it as far as Doku­Wiki is con­cer­ned those “ma­gic” bytes do ap­pear in the page pre­sen­ted to the user.

    Depending on a page's actual content and the respec­tive CSS rules in ef­fect this may lead to un­de­sired re­sults. One way to get rid of this pro­blem would be to open the affec­ted page(s) with Doku­Wiki's built-in edit fea­ture and simply re­move those bytes. How­ever, such an ap­proach would cause the word­ pro­ces­sor to open the file as plain text assu­ming it's in ASCII or, say, ISO-8859-1 for­mat – whatever may be confi­gu­red as its de­fault text format. That, in conse­quence would in­va­li­date (or at least ren­der strange­ly) all UTF8 cha­rac­ter se­quen­ces.

    Actually that is the recommended approach if (i.e. if) you ne­ver in­tend to edit the wiki pages by an ex­ter­nal edi­tor.

    As it happens, personally I prefer to edit the pages (of a local Doku­Wiki in­stal­la­tion) by edi­tors like Kate or OpenOffice.org for various rea­sons2). There­for I3) need those “magic” bytes but I don't want them to show up in the pa­ges pre­sen­ted to the end user (reader). Enter syntax_plugin_bomfix.

    Usage

    The whole purpose of this plugin is to suppress the out­put of that “magic” byte sequence. And no­thing more.

    :!: There are no new wiki language features intro­duced by this plugin. Nor is there any­thing special you have to remem­ber when edi­ting one of your al­ready existing or newly crea­ted pages. Hence – besi­des in­stal­ling this plugin there's no­thing to do or respect.

    Installation

    It's quite easy to integrate this plugin with your DokuWiki:

    1. Download the source archive (~3KB) and un­pack it in your Doku­Wiki plug­in di­rec­tory {dokuwiki}/lib/plugins (make sure, in­clu­ded sub­di­rec­to­ries are un­packed cor­rectly); this will create the directory {dokuwiki}/lib/plugins/bomfix.
    2. Make sure both the new direc­tory and the files therein are read­able by the web-server e.g.
      	chown apache:apache dokuwiki/lib/plugins/* -Rc
      

    You might as well use the plugin manager for installing or updating this plugin.

    Plugin Source

    Here comes the GPLed PHP source4) for those who'd like to scan be­fore actu­ally in­stal­ling it:

    <?php
    if (! class_exists('syntax_plugin_bomfix')) {
      if (! defined('DOKU_PLUGIN')) {
        if (! defined('DOKU_INC')) {
          define('DOKU_INC', realpath(dirname(__FILE__) . '/../../') . '/');
        } // if
        define('DOKU_PLUGIN', DOKU_INC . 'lib/plugins/');
      } // if
      // Include parent class:
      require_once(DOKU_PLUGIN . 'syntax.php');
     
    /**
     * <tt>syntax_plugin_bomfix.php </tt>- A PHP4 class that implements
     * a <tt>DokuWiki</tt> plugin for <tt>UTF8 "magic" bytes</tt>.
     *
     * <p>
     * External editors (i.e. separate standalone programs like wordprocessing
     * software) usually mark a file in UTF8 format by prepending its content
     * with a "magic" byte sequence at the very start of file. While there is
     * no harm in it as far as DokuWiki is concerned those "magic" bytes
     * (Byte Order Mark) <em>do</em> appear in the page presented to the user.
     * </p><p>
     * Depending on a page's actual content and the respective CSS rules in
     * effect this may lead to undesired results. One way to get rid of this
     * problem would be to open the affected page(s) with DokuWiki's builtin
     * edit feature and simply remove those bytes. However, such an approach
     * would cause the wordprocessor to open the file as plain text assuming
     * it's in ASCII or, say, ISO-8859-1 format - whatever may be configured
     * as the default text format. That, in consequence, would invalidate (or
     * at least render strangely) all UTF8 character sequences.
     * </p><p>
     * Actually that is the recommended approach <em>if</em> (i.e. <tt>if</tt>)
     * you never intend to edit the wiki pages by an external editor.
     * </p><p>
     * As it happens, personally I prefer to edit the pages (of a local DokuWiki
     * installation) by OpenOffice.org for various reasons. (And, yes, I know
     * that I bypass DokuWiki's changes-system this way.) Therefor I need those
     * "magic" bytes <em>but</em> I don't want them to show up in the pages
     * presented to the end user (reader). Enter <tt>syntax_plugin_bomfix</tt>.
     * The whole purpose of this plugin is to suppress the output of that
     * "magic" byte sequence. And nothing more. There are no new wiki language
     * features introduced by this plugin. Nor is there anything special you
     * have to remember when editing one of your already existing or newly
     * created pages.
     * </p><p>
     * To use it just install the plugin in your DokuWiki's plugin folder.
     * That's all.
     * </p><pre>
     *  Copyright (C) 2006, 2008  M.Watermann, D-10247 Berlin, FRG
     *      All rights reserved
     *    EMail : &lt;support@mwat.de&gt;
     * </pre>
     * <div class="disclaimer">
     * This program is free software; you can redistribute it and/or modify
     * it under the terms of the GNU General Public License as published by
     * the Free Software Foundation; either
     * <a href="http://www.gnu.org/licenses/gpl.html">version 3</a> of the
     * License, or (at your option) any later version.<br>
     * This software is distributed in the hope that it will be useful,
     * but WITHOUT ANY WARRANTY; without even the implied warranty of
     * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
     * General Public License for more details.
     * </div>
     * @author <a href="mailto:support@mwat.de">Matthias Watermann</a>
     * @version <tt>$Id: syntax_plugin_bomfix.php,v 1.5 2008/11/16 13:21:55 matthias Exp $</tt>
     * @since created 24-Dec-2006
     */
    class syntax_plugin_bomfix extends DokuWiki_Syntax_Plugin {
     
      /**
       * @publicsection
       */
      //@{
     
      /**
       * Tell the parser whether the plugin accepts syntax mode
       * <tt>$aMode</tt> within its own markup.
       *
       * @param $aMode String The requested syntaxmode.
       * @return Boolean <tt>FALSE</tt> always since no nested markup
       * is possible with this plugin.
       * @public
       */
      function accepts($aMode) {
        return FALSE;
      } // accepts()
     
      /**
       * Connect lookup pattern to lexer.
       *
       * @param $aMode String The desired rendermode.
       * @public
       * @see render()
       */
      function connectTo($aMode) {
        $this->Lexer->addSpecialPattern('^\xEF\xBB\xBF',
          $aMode, 'plugin_bomfix');
      } // connectTo()
     
      /**
       * Get an associative array with plugin info.
       *
       * <p>
       * The returned array holds the following fields:
       * <dl>
       * <dt>author</dt><dd>Author of the plugin</dd>
       * <dt>email</dt><dd>Email address to contact the author</dd>
       * <dt>date</dt><dd>Last modified date of the plugin in
       * <tt>YYYY-MM-DD</tt> format</dd>
       * <dt>name</dt><dd>Name of the plugin</dd>
       * <dt>desc</dt><dd>Short description of the plugin (Text only)</dd>
       * <dt>url</dt><dd>Website with more information on the plugin
       * (eg. syntax description)</dd>
       * </dl>
       * @return Array Information about this plugin class.
       * @public
       * @static
       */
      function getInfo() {
        return array(
          'author' =>  'Matthias Watermann',
          'email' =>  'support@mwat.de',
          'date' =>  '2008-11-16',
          'name' =>  'BOMfix Syntax Plugin',
          'desc' =>  'Ignore UTF8 "magic" bytes at start of page',
          'url' =>  'http://www.dokuwiki.org/plugin:bomfix');
      } // getInfo()
     
      /**
       * Where to sort in?
       *
       * @return Integer <tt>380</tt> (doesn't really matter).
       * @static
       * @public
       */
      function getSort() {
        return 380;
      } // getSort()
     
      /**
       * Get the type of syntax this plugin defines.
       *
       * @return String <tt>'substition'</tt> (i.e. 'substitution').
       * @static
       * @public
       */
      function getType() {
        return 'substition';  // sic! should be __substitution__
      } // getType()
     
      /**
       * Handler to prepare matched data for the rendering process.
       *
       * <p>
       * The <tt>$aState</tt> parameter gives the type of pattern
       * which triggered the call to this method:
       * </p>
       * <dl>
       * <dt>DOKU_LEXER_ENTER</dt>
       * <dd>a pattern set by <tt>addEntryPattern()</tt></dd>
       * <dt>DOKU_LEXER_MATCHED</dt>
       * <dd>a pattern set by <tt>addPattern()</tt></dd>
       * <dt>DOKU_LEXER_EXIT</dt>
       * <dd> a pattern set by <tt>addExitPattern()</tt></dd>
       * <dt>DOKU_LEXER_SPECIAL</dt>
       * <dd>a pattern set by <tt>addSpecialPattern()</tt></dd>
       * <dt>DOKU_LEXER_UNMATCHED</dt>
       * <dd>ordinary text encountered within the plugin's syntax mode
       * which doesn't match any pattern.</dd>
       * </dl><p>
       * This implementation does nothing (ignoring the passed arguments)
       * and just returns the given <tt>$aState</tt>.
       * </p>
       * @param $aMatch String The text matched by the patterns.
       * @param $aState Integer The lexer state for the match.
       * @param $aPos Integer The character position of the matched text.
       * @param $aHandler Object Reference to the Doku_Handler object.
       * @return Integer The current lexer state.
       * @public
       * @see render()
       * @static
       */
      function handle($aMatch, $aState, $aPos, &$aHandler) {
        return $aState;  // doesn't really matter as it's ignored anyway ...
      } // handle()
     
      /**
       * Handle the actual output creation.
       *
       * <p>
       * The method checks for the given <tt>$aFormat</tt> and returns
       * <tt>FALSE</tt> when a format isn't supported.
       * <tt>$aRenderer</tt> contains a reference to the renderer object
       * which is currently handling the rendering.
       * The contents of <tt>$aData</tt> is the return value of the
       * <tt>handle()</tt> method.
       * </p><p>
       * Besides "eating" the BOM implicitely this implementation does
       * nothing (ignoring all passed arguments) and always returns
       * <tt>TRUE</tt>.
       * </p>
       * @param $aFormat String The output format to generate.
       * @param $aRenderer Object A reference to the renderer object.
       * @param $aData Integer The data created/returned by the
       * <tt>handle()</tt> method.
       * @return Boolean <tt>TRUE</tt> always since there's no actual
       * rendering done and hence can't ever fail.
       * @public
       * @see handle()
       * @static
       */
      function render($aFormat, &$aRenderer, $aData) {
        // nothing to do here - just 'eat' the BOM
        return TRUE;
      } // render()
     
      //@}
    } // class syntax_plugin_bomfix
    } // if
    //Setup VIM: ex: et ts=2 enc=utf-8 :
    ?>

    Changes

    2008-11-16:
    2008-10-29:
    * minor doc corrections;

    2007-08-15:
    * added GPL link and fixed some doc problems;

    2007-12-26:
    + initial release;

    Matthias Watermann 2008-11-16

    See also

    Plugins by the same author

    Discussion

    Hints, comments, suggestions …


    1) often called BOM: Byte Order Mark
    2) and, yes, I know that I bypass Doku­Wiki's locking and chan­ges-sy­stem this way; but I know what, when and how I'm doing it…
    3) i.e. the edi­tor
    4) The comments within the source file are suit­able for the OSS doxygen tool, a do­cu­men­ta­tion sy­stem for C++, C, Java, Ob­jec­tive-C, Py­thon, IDL and to some ex­tent PHP, C#, and D. — Since I'm wor­king with dif­fe­rent pro­gram­ming lan­gua­ges it's a great ease to have one tool that handles the docs for all of them.
    5) obsoleted by incorporating its ability into the Code plugin
     
    plugin/bomfix.txt · Last modified: 2009/01/04 08:33 by 67.170.0.207
     
    Except where otherwise noted, content on this wiki is licensed under the following license:CC Attribution-Noncommercial-Share Alike 3.0 Unported
    Imprint Recent changes RSS feed Donate Powered by PHP Valid XHTML 1.0 Valid CSS Driven by DokuWiki
    WikiForumIRCBugsGitXRefTranslate