Translations of this page?:

Metadata Storage

If the content stored in a wiki page is data, things like the time of last update, who updated it, the filesize etc. could all be regarded as metadata for the wiki page. This page describes where and how such additional data is stored in DokuWiki.

Storage

DokuWiki does not store all metadata at central place (like a database or registry). Metadata can basically be the own datafile's properties (eg. filesize, last modified date), the other metadata are kept by DokuWiki within the meta directory. Metadata are found within the .meta file corresponding to the wiki page name.

Metadata Renderer

Info in the meta directory is initially written by the metadata renderer. It creates a parallel file for each page named $id.meta in the meta directory. The file is a serialized multi-dimensional PHP array whose keys follow the Dublin Core element names.

The renderer gets not invoked when a page is saved but either when it is accessed and the metadata doesn't exist yet or when in the cache renderer (called in p_cached_output in p_wiki_xhtml in html_show in tpl_content in the template) it is determined that the cache is outdated or through the indexer when the page was added through ways other than DokuWiki (eg. a script).

In the current development version the rendering process is triggered differently. There is a new cache file for the metadata (that just contains a timestamp) and metadata is rendered when the cache logic determines that it is needed (similar to the xhtml renderer cache, i.e. disabling the xhtml cache also disables the metadata cache except that metadata for the current page is always rendered just one time per page load). This also means that the xhtml cache does not automatically triggers metadata rendering like in previous versions and plugin authors should make sure that when they use metadata properties to store e.g. the status of linked pages that they use the metadata renderer cache event for determining cache validity because in the xhtml renderer cache event they will get the already updated metadata while reading the metadata in the metadata cache handler will give them the old metadata. The xhtml cache itself depends on the metadata file so metadata rendering will trigger xhtml rendering automatically when the metadata has been changed (otherwise it isn't saved).

Functions to Get and Set Metadata

There are two functions in inc/parserutils.php to deal with metadata:

  • p_get_metadata($id, $key, $render) returns a metadata value for a page.
    • $id is the ID of a wiki page; required
    • $key the name of the metadata item to be retrieved. Defaults to false. If empty, an array of all the metadata items is returned.
    • $render boolean, whether or not the page metadata should be generated by the renderer if no metadata exists; optional, default is false. develonly: The parameter determines if the page metadata should be generated by the renderer when the metadata cache indicates that it shouldn't be used and p_get_metadata isn't called from within p_get_metadata, default is true. Set it to false when you request metadata for a lot of pages in a row as this function can trigger the parsing and rendering of the requested page.
  • p_set_metadata($id, $data, $render, $persistent) sets some properties in the metadata.
    • $id is the ID of a wiki page; required
    • $data is an array with key ⇒ value pairs to be set in the metadata, required
    • $render boolean, whether or not the page metadata should be generated with the renderer; optional, default is false
    • $persistent a boolean which indicates whether or not the particular metadata value will persist through the next metadata rendering. The default value is true.

Data Structure

Currently, the following metadata is saved by the core metadata renderer:

  • 'title' – string, first heading
  • 'creator' – string, full name of the user who created the page
  • 'description' – array
    • 'abstract' – raw text abstract (250 to 500 chars) of the page
    • 'tableofcontents' – array, list of header id ('hid'), title ('title'), list item type ('type') and header level ('level')
  • 'contributor' array, list of user ID ⇒ full name of users, who have contributed to the page
  • 'date' – array
    • 'created' – timestamp, creation date
    • 'modified'– timestamp, date of last non-minor modification
    • 'valid'
      • 'age' – seconds, period in seconds before the page should be refreshed (used by 'rss' syntax only)
  • 'last_change' – array, the last changelog entry
    • 'date' – timestamp, date of the last change
    • 'ip' – ip of the user editing
    • 'type' – type of the edit (C create, E edit, e minor edit, D delete, R revert)
    • 'id' – id of the page
    • 'user' – username of the user editing
    • 'sum' – summary of the editor
    • 'extra' – extra data, used for storing the revision (timestamp) in the case of a revert
  • 'relation' – array
    • 'isreferencedby' – array, list of pages that link to this page: ID ⇒ boolean exists, this is not used or written by DokuWiki core
    • 'references' – array, list of linked pages: ID ⇒ boolean exists
    • 'firstimage' – id or url of the first image in the page
    • 'haspart' – array, list of included rss feeds (and more, see below)
  • 'internal' – array
    • 'cache' – boolean, if the cache may be used
    • 'toc' – boolean, if the toc shall be displayed

Additionally, plugins can support more metadata elements. Currently used:

  • 'relation' – array
    • 'ispartof' – array, list of pages that include the current page: ID ⇒ boolean exists (include plugin)
    • 'haspart' – array, list of included pages: ID ⇒ boolean exists (include plugin) or rss feeds
  • 'subject' – array, lists of tags (tag plugin, blogtng plugin, flattr plugin); this is used by feed.php, if present
  • 'type' – string, 'draft' for drafts (blog plugin)
  • 'geo' – array, list of geographic tags (geotag, openlayersmap plugin)

It's recommended to use keys from the Dublin Core element set for any metadata that might be interesting for external use.

This data is stored in an associative array with two keys: 'current' for all current data (including persistent one), 'persistent' for data that shall be kept over metadata rendering.

Metadata Persistence

Internally DokuWiki maintains two arrays of metadata, current & persistent. The persistent array holds duplicates of those key/values which should not be cleared during the rendering process. All requests for metadata values using p_get_metadata() are met using the current array.

Examples of persistent metadata keys are:

  • 'creator'
  • 'contributor'

Metadata and Plugins

In addition to the get and set metadata functions mentioned above, there are two other mechanisms plugins can use to interact with metadata.

Syntax Plugins can create metadata for the current page with their render() method by handling the $format==“metadata”. Metadata key/value pairs are added to the renderer->meta array and persistent values are also added to the renderer->persistent array.

Action Plugins can register for the PARSER_METADATA_RENDER method to inspect or modify metadata before or after metadata rendering.

Metadata index

FIXME this should be further extended.

Since the 2011-05-25 (“Rincewind”) release there is an index where metadata properties can be stored. It is organized in a similar manner as the fulltextindex and uses the same page list but different word indexes for each indexed metadata property, they are named $metaname_w.idx, $metaname_i.idx and $metaname_p.idx. In DokuWiki itself currently the properties relation_references and title are indexed. Plugins can add their own metadata keys and it is also possible to add arbitrary data to the index. This can be done with the INDEXER_PAGE_ADD event. Plugins need to make sure they add themselves to the indexer version using the INDEXER_VERSION_GET event. All metadata indexes are recorded in the metadata.idx index so deleted pages can be removed from all metadata indexes.

The indexer object (which can be obtained by using idx_get_indexer) supports the following methods for metadata:

  • addMetaKeys($page, $key, $value=null) - adds one or more metadata entries to a page (normally this should be done using INDEXER_PAGE_ADD but if plugins want to update the index explicitly and immediately this function can be used)
  • lookupKey($key, &$value, $func=null) - for looking up all pages where a certain metadata key has the specified value. It is possible to pass multiple keys as array, then an array with matches for each key is returned. Additionally with the $func parameter it is possible to pass a comparison function like preg_match.
  • getPages($key=null) - if the $key parameter is set only pages where the metadata key $key is set to at least one value are returned.

Example for getting all backlinks of a certain page:

$result = idx_get_indexer()->lookupKey('relation_references', $id);

(note that this functionality and an ACL check is available as ft_backlinks($id)).

devel/metadata.txt · Last modified: 2011/10/16 21:57 by 91.50.177.111
 
Except where otherwise noted, content on this wiki is licensed under the following license: CC Attribution-Share Alike 3.0 Unported
Imprint Recent changes RSS feed Donate Powered by PHP Valid XHTML 1.0 Valid CSS Driven by DokuWiki
WikiForumIRCBugsGitXRefTranslate