DokuWiki

It's better when it's simple

User Tools

Site Tools


plugin:lang

Language Syntax PlugIn

Compatible with DokuWiki

2005-07-13+

plugin This plugin allows for adding markup to indicate other languages.

Last updated on
2007-08-15
Provides
Syntax

This extension has not been updated in over 2 years. It may no longer be maintained or supported and may have compatibility issues.

Tagged with language

Sometimes there arises the need to use words, phrases or even whole sen­ten­ces or paragraphs in a language different from the document's main lan­gua­ge1). To support the readers2) of such a document using several languages it's advisably to explicitly markup all language changes in a do­cu­ment.

This plugin allows for adding markup to indicate such language changes. It is implemented – technically speaking – by adding appropriate span tags around the text in question.

Usage

To actually make use of this plugin embed the text using another language than the document's rest in lang tags:

<lang code>
...
</lang>

The language-code part is usually the two-letter language code as defined by ISO standard 639, Code for the representation of names of languages, the details of its use are explained in RFC 3066 Tags for the Identification of Languages. See the HTML specs as well for further details.

Please note that this is so-called inline markup, meaning it is to be used in­si­de block elements3). The lang tag (as well as its HTML equivalent span) does not constitute a text block but is part of it. In consequence this means that you'll have to open a new block (by inserting an empty line) in case you want to mark­up a whole paragraph as can be seen in the following examples.

Examples

Suppose a document written in plain English. Some sentences, however, are to be given in another language. Therefore those “foreign” parts are marked up as in the following example:

**1**

This is an __English__ sentence. <lang de>Dies ist ein //deutscher// Satz.</lang> This is a second __English__ sentence.

**2**

This is an __English__ sentence.
<lang de-DE>Dies ist ein //deutscher// Satz.</lang>
This is a second __English__ sentence.

**3**

This is an __English__ sentence.
<lang de>
Dies ist ein //deutscher// Satz.
</lang>
This is a second __English__ sentence.

**4**

This is an __English__ paragraph.

<lang de->
Dies ist ein //deutscher// Absatz.
</lang>

This is a second __English__ paragraph.

**5**

This is an __English__ paragraph.

<lang x-klingon>Well, I, er ... dunno how to, hmmm... write Klingon.</lang>

This is a second __English__ paragraph.

As can be seen the formatting4) follows the usual rules for inline markup. In sec­tions one to three the text portion in a different language5) is just a part (here: sentence) between other parts. In sections four and five, however, there are newlines (empty lines) before and after the lang markup which renders that part to become a paragraph between other paragraphs.

The resulting HTML, btw, looks as follows:

<p><strong>1</strong></p>
 
<p>This is an <u>English</u> sentence. <span lang="de" xml:lang="de">Dies ist ein <em>deutscher</em> Satz.</span> This is a second <u>English</u> sentence.</p>
 
<p><strong>2</strong></p>
 
<p>This is an <u>English</u> sentence. <span lang="de-DE" xml:lang="de-DE">Dies ist ein <em>deutscher</em> Satz.</span> This is a second <u>English</u> sentence.</p>
 
<p><strong>3</strong></p>
 
<p>This is an <u>English</u> sentence. <span lang="de" xml:lang="de">Dies ist ein <em>deutscher</em> Satz. </span> This is a second <u>English</u> sentence.</p>
 
<p><strong>4</strong></p>
 
<p>This is an <u>English</u> paragraph.</p>
 
<p><span lang="de" xml:lang="de">Dies ist ein <em>deutscher</em> Absatz. </span></p>
 
<p>This is a second <u>English</u> paragraph.</p>
 
<p><strong>5</strong></p>
 
<p>This is an <u>English</u> paragraph.</p>
 
<p><span lang="x-klingon" xml:lang="x-klingon">Well, I, er ... dunno how to, hmmm... write Klingon.</span></p>
 
<p>This is a second <u>English</u> paragraph.</p>

Installation

It's quite easy to integrate this plugin with your DokuWiki:

  1. Download the source archive (~3KB) and unpack it in your Doku­Wiki plug­in di­rec­tory {dokuwiki}/lib/plugins (make sure, in­clu­ded sub­di­rec­to­ries are un­packed cor­rectly); this will create the directory {dokuwiki}/lib/plugins/lang.
  2. Make sure both the new direc­tory and the files therein are read­able by the web-server e.g.
    	chown apache:apache dokuwiki/lib/plugins/* -Rc

You might as well use the plugin manager for installing or updating this plugin.

Plugin Source

Here comes the GPLed PHP source6) for those who'd like to scan it be­fore actu­ally in­stal­ling it:

<?php
if (! class_exists('syntax_plugin_lang')) {
  if (! defined('DOKU_PLUGIN')) {
    if (! defined('DOKU_INC')) {
      define('DOKU_INC', realpath(dirname(__FILE__) . '/../../') . '/');
    } // if
    define('DOKU_PLUGIN', DOKU_INC . 'lib/plugins/');
  } // if
  // include parent class
  require_once(DOKU_PLUGIN . 'syntax.php');
 
/**
 * <tt>syntax_plugin_lang.php </tt>- A PHP4 class that implements
 * a <tt>DokuWiki</tt> plugin to specify an area using a different
 * language than the remaining document.
 *
 * <p>
 * Markup a section of text to be using a different language,
 * <tt>lang 2-letter-lang-code</tt>
 * </p><pre>
 *  Copyright (C) 2005, 2007 DFG/M.Watermann, D-10247 Berlin, FRG
 *      All rights reserved
 *    EMail : &lt;support@mwat.de&gt;
 * </pre>
 * <div class="disclaimer">
 * This program is free software; you can redistribute it and/or modify
 * it under the terms of the GNU General Public License as published by
 * the Free Software Foundation; either
 * <a href="http://www.gnu.org/licenses/gpl.html">version 3</a> of the
 * License, or (at your option) any later version.<br>
 * This software is distributed in the hope that it will be useful,
 * but WITHOUT ANY WARRANTY; without even the implied warranty of
 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
 * General Public License for more details.
 * </div>
 * @author <a href="mailto:support@mwat.de">Matthias Watermann</a>
 * @version <tt>$Id: syntax_plugin_lang.php,v 1.4 2007/08/15 12:36:19 matthias Exp $</tt>
 * @since created 1-Sep-2005
 */
class syntax_plugin_lang extends DokuWiki_Syntax_Plugin {
 
  /**
   * @publicsection
   */
  //@{
 
  /**
   * Tell the parser whether the plugin accepts syntax mode
   * <tt>$aMode</tt> within its own markup.
   *
   * @param $aMode String The requested syntaxmode.
   * @return Boolean <tt>TRUE</tt> unless <tt>$aMode</tt> is
   * <tt>plugin_lang</tt> (which would result in a
   * <tt>FALSE</tt> method result).
   * @public
   * @see getAllowedTypes()
   * @static
   */
  function accepts($aMode) {
    return ('plugin_lang' != $aMode);
  } // accepts()
 
  /**
   * Connect lookup pattern to lexer.
   *
   * @param $aMode String The desired rendermode.
   * @public
   * @see render()
   */
  function connectTo($aMode) {
    // See http://www.w3.org/TR/html401/struct/dirlang.html#h-8.1.1;
    // better (specialized) REs are used in 'handle()' method.
    $this->Lexer->addEntryPattern(
      '\x3Clang\s+[a-z\-A-Z0-9]{2,})?\s*\x3E\s*(?=(?s).*?\x3C\x2Flang\x3E)',
      $aMode, 'plugin_lang');
  } // connectTo()
 
  /**
   * Get an associative array with plugin info.
   *
   * <p>
   * The returned array holds the following fields:
   * <dl>
   * <dt>author</dt><dd>Author of the plugin</dd>
   * <dt>email</dt><dd>Email address to contact the author</dd>
   * <dt>date</dt><dd>Last modified date of the plugin in
   * <tt>YYYY-MM-DD</tt> format</dd>
   * <dt>name</dt><dd>Name of the plugin</dd>
   * <dt>desc</dt><dd>Short description of the plugin (Text only)</dd>
   * <dt>url</dt><dd>Website with more information on the plugin
   * (eg. syntax description)</dd>
   * </dl>
   * @return Array Information about this plugin class.
   * @public
   * @static
   */
  function getInfo() {
    return array(
      'author' =>  'Matthias Watermann',
      'email' =>  'support@mwat.de',
      'date' =>  '2007-08-15',
      'name' =>  'LANGuage Syntax Plugin',
      'desc' =>  'Markup a text area using another language',
      'url' =>  'http://www.dokuwiki.org/plugin:lang');
  } // getInfo()
 
  /**
   * Where to sort in?
   *
   * @return Integer <tt>498</tt> (doesn't really matter).
   * @public
   * @static
   */
  function getSort() {
    return 498;
  } // getSort()
 
  /**
   * Get the type of syntax this plugin defines.
   *
   * @return String <tt>'formatting'</tt>.
   * @public
   * @static
   */
  function getType() {
    return 'formatting';
  } // getType()
 
  /**
   * Handler to prepare matched data for the rendering process.
   *
   * <p>
   * The <tt>$aState</tt> parameter gives the type of pattern
   * which triggered the call to this method:
   * </p>
   * <dl>
   * <dt>DOKU_LEXER_ENTER</dt>
   * <dd>a pattern set by <tt>addEntryPattern()</tt></dd>
   * <dt>DOKU_LEXER_MATCHED</dt>
   * <dd>a pattern set by <tt>addPattern()</tt></dd>
   * <dt>DOKU_LEXER_EXIT</dt>
   * <dd> a pattern set by <tt>addExitPattern()</tt></dd>
   * <dt>DOKU_LEXER_SPECIAL</dt>
   * <dd>a pattern set by <tt>addSpecialPattern()</tt></dd>
   * <dt>DOKU_LEXER_UNMATCHED</dt>
   * <dd>ordinary text encountered within the plugin's syntax mode
   * which doesn't match any pattern.</dd>
   * </dl>
   * @param $aMatch String The text matched by the patterns.
   * @param $aState Integer The lexer state for the match.
   * @param $aPos Integer The character position of the matched text.
   * @param $aHandler Object Reference to the Doku_Handler object.
   * @return Array Index <tt>[0]</tt> holds the current
   * <tt>$aState</tt>, index <tt>[1]</tt> the match prepared for
   * the <tt>render()</tt> method.
   * @public
   * @see render()
   * @static
   */
  function handle($aMatch, $aState, $aPos, &$aHandler) {
    if (DOKU_LEXER_ENTER == $aState) {
      $hits = array();
      // RFC 3066, "2. The Language tag", p. 2f.
      // Language-Tag = Primary-subtag *( "-" Subtag )
      if (preg_match('|\s+([a-z]{2,3})\s*>|i', $aMatch, $hits)) {
        // primary _only_ (most likely to be used)
        return array($aState, $hits[1]);
      } // if
      if (preg_match('|\s+([a-z]{2,3}\-[a-z0-9]{2,})\s*>|i',
      $aMatch, $hits)) {
        // primary _and_ subtag
        return array($aState, $hits[1]);
      } // if
      if (preg_match('|\s+([ix]\-[a-z0-9]{2,})\s*>|i', $aMatch, $hits)) {
        // 1-letter primary _and_ subtag
        return array($aState, $hits[1]);
      } // if
      if (preg_match('|\s+([a-z]{2,3})\-.*\s*>|i', $aMatch, $hits)) {
        // convenience: accept primary with empty subtag
        return array($aState, $hits[1]);
      } // if
      // invalid language specification
      return array($aState, FALSE);
    } // if
    return array($aState, $aMatch);
  } // handle()
 
  /**
   * Add exit pattern to lexer.
   *
   * @public
   */
  function postConnect() {
    $this->Lexer->addExitPattern('\x3C\x2Flang\x3E', 'plugin_lang');
  } // postConnect()
 
  /**
   * Handle the actual output creation.
   *
   * <p>
   * The method checks for the given <tt>$aFormat</tt> and returns
   * <tt>FALSE</tt> when a format isn't supported. <tt>$aRenderer</tt>
   * contains a reference to the renderer object which is currently
   * handling the rendering. The contents of <tt>$aData</tt> is the
   * return value of the <tt>handle()</tt> method.
   * </p>
   * @param $aFormat String The output format to generate.
   * @param $aRenderer Object A reference to the renderer object.
   * @param $aData Array The data created by the <tt>handle()</tt>
   * method.
   * @return Boolean <tt>TRUE</tt> if rendered successfully, or
   * <tt>FALSE</tt> otherwise.
   * @public
   * @see handle()
   *
   */
  function render($aFormat, &$aRenderer, &$aData) {
    if ('xhtml' != $aFormat) {
      return FALSE;
    } // if
    static $VALID = TRUE;  // flag to notice invalid markup
    switch ($aData[0]) {
      case DOKU_LEXER_ENTER:
        if ($aData[1]) {
          $aRenderer->doc .= '<span lang="' . $aData[1]
            . '" xml:lang="' . $aData[1] . '">';
        } else {
          $VALID = FALSE;
        } // if
        return TRUE;
      case DOKU_LEXER_UNMATCHED:
        $aRenderer->doc .= str_replace(array('&','<', '>'),
          array('&#38;', '&#60;', '&#62;'), $aData[1]);
        return TRUE;
      case DOKU_LEXER_EXIT:
        if ($VALID) {
          $aRenderer->doc .= '</span>';
        } else {
          $VALID = TRUE;
        } // if
      default:
        return TRUE;
    } // switch
  } // render()
 
  //@}
} // class syntax_plugin_lang
} // if
//Setup VIM: ex: et ts=2 enc=utf-8 :
?>

Changes

2007-08-15:
* added GPL link and fixed some doc problems;

2007-01-05:
* minor internal changes (added comments, date updated);

2005-09-04:
+ initial release;

Matthias Watermann 2007-08-15

See also

Plugins by the same author

Discussion

Hints, comments, suggestions …

Dosn't seem to work too well in Internet Explorer.

Don't worry: M$IE is well known for not caring about standards8). Trying to work around the various bugs of that awful program9) is an endless business.

Word 2003 has an option to manually insert phonetics above specified words…

I was wondering if it was possible to create a module or plugin for DokuWiki that does the following for Koine-Greek: a) allows the user to upload a two column wordlist; first column source text, second column phonetic text. b) specify the fonts for the source and phonetic text. c) Have the DokuWiki, automatically recognize the words from the source text on any text [as one types] and auto-insert and center the phonetic text ABOVE each (tagged) occurrence…

An optional button to insert tags on selected text would be great also, not to mention Unicode capability for the source text column, and the option to configure both language and fonts as per source text and phonetic output, if necessary

Thanx a million…

Please contact keith (at) pm-intl (.) org

Keith

See bounties for such requests.

Suggestion: Add dir=“rtl” to span tag for RTL languages. It can possibly be determined by $lang['direction'] in lang.php of that language.


Unfortunately, headline code, e.g. “== headline ==” is not interpreted as headline code, but printed as raw code, ie. the “==” are printed and no headlining code is generated. The same is true for the language tag of the wrap tool.
Rolf Hemmerling 2009-12-23 10:00


1) for instance consider writing quotes in their respective native language
2) i.e. their device/software accessing such a document and possibly providing some accessibility aids like switching fonts or quote characters or using another voice for reading or …
3) such as paragraphs, list items, table cells etc.
4) i.e. the placement of the <lang ...> markup in regard to the surrounding text and newlines
5) I've used German here since I'm a German ;-)
6) The comments within the source file are suitable for the OSS doxygen tool, a do­cu­men­ta­tion sy­stem for C++, C, Java, Ob­jec­tive-C, Python, IDL and to some extent PHP, C#, and D. — Since I'm working with dif­fe­rent pro­gram­ming lan­gua­ges it's a great ease to have one tool that handles the docs for all of them.
7) obsoleted by incorporating its ability into the Code plugin
8) at least those they don't own
9) for need of a more adequate designation
plugin/lang.txt · Last modified: 2013-08-02 19:47 by Klap-in