Table of Contents
Metadata Storage
If the content stored in a wiki page is data, things like the time of last update, who updated it, the filesize etc. could all be regarded as metadata for the wiki page. This page describes where and how such additional data is stored in DokuWiki.
Metadata can also be used by plugins for different purposes, apart from storing obvious metadata for the page it can also be used to store data that can be used to determine whether a cache can be used or settings like if a certain feature of the plugin should be enabled on a page.
Storage
DokuWiki does not store all metadata at central place (like a database or registry). Metadata can basically be the own datafile's properties (eg. filesize, last modified date), the other metadata are kept by DokuWiki within the meta
directory. Metadata are found within the .meta
file corresponding to the wiki page name. There is also an index in which selected metadata can be searched.
Metadata Renderer
Info in the meta
directory is initially written by the metadata renderer. It creates a parallel file for each page named <pageid>.meta
in the meta
directory. The file is a serialized multi-dimensional PHP array whose keys follow the Dublin Core element names.
Data Structure
Currently, the following metadata is saved by the core metadata renderer:
title
– string, first headingcreator
– string, full name of the user who created the pageuser
– string, the login name of the user who created the pagedescription
– arrayabstract
– raw text abstract (250 to 500 chars) of the pagetableofcontents
– array, list of arrays with header id ('hid'), title ('title'), list item type ('type') and header level ('level')
contributor
array, list of user ID ⇒ full name of users, who have contributed to the pagedate
– arraycreated
– timestamp, creation datemodified
– timestamp, date of last non-minor modificationvalid
age
– seconds, period in seconds before the page should be refreshed (used by 'rss' syntax only)
last_change
– array, the last changelog entrydate
– timestamp, date of the last changeip
– ip of the user editingtype
– type of the edit (C create, E edit, e minor edit, D delete, R revert)id
– id of the pageuser
– username of the user editingsum
– summary of the editorextra
– extra data, used for storing the revision (timestamp) in the case of a revert
relation
– arrayisreferencedby
– array, list of pages that link to this page: ID ⇒ boolean exists, this is not used or written by DokuWiki corereferences
– array, list of linked pages: page ID ⇒ boolean existsmedia
– array, list of linked media files: media ID ⇒ boolean existsfirstimage
– id or url of the first image in the pagehaspart
– array, list of included rss feeds (and more, see below)
internal
– arraycache
– boolean, if the cache may be usedtoc
– boolean, if the toc shall be displayed
Additionally, plugins can support more metadata elements. Currently used:
relation
– arraytype
– string, 'draft' for drafts (blog plugin)-
lat
– number, latitude of this location in decimal degreeslon
– number, longitude of this location in decimal degreesalt
– number, altitude in meter above sea levelregion
– string, region of this location, eg. a province or statecountry
– string, the country of this locationplacename
– string, placename describing this location or areageohash
– string, geohash of this location
It's recommended to use keys from the Dublin Core element set for any metadata that might be interesting for external use.
For plugin internal data it is recommended to store your keys under the plugin
key:
plugin
– array, contains keys for all plugins storing metadatayourplugin
– array, the keys you need for your plugin
This data is stored in an associative array with two keys: 'current' for all current data (including persistent one), 'persistent' for data that shall be kept over metadata rendering.
Metadata Persistence
Internally DokuWiki maintains two arrays of metadata, current
& persistent
. The persistent
array holds duplicates of those key/values which should not be cleared during the rendering process. All requests for metadata values using p_get_metadata()
are met using the current
array.
Examples of persistent metadata keys are:
- 'creator'
- 'contributor'
Running of metadata rendering
The metadata rendering is only started by the p_get_metadata()
and p_set_metadata()
. This differs from the xhtml renderer. The wikipage parsing process has two stages: generation of the instructions by the Handler and next the generation of xhtml output with these instructions as input. As all Renderers the metadata renderer uses the same instructions as input. In the metadata renderer the metadata can directly be accessed at renderer->meta
and renderer->persistent
. Some examples and bit of explanation can be found at syntax plugins development documentation.
The metadata renderer creates also an short raw text abstract. The abstract is created from the rendered instruction by adding compact text without html to $this->doc
. Use the $this->capture
to check whether the renderer still collects text for the abstract.
// capture only the first few sections. // Is switched off as well by eg. section metarenderer if ($this->capture){ if($linktitle) { $this->doc .= $linktitle; } else { $this->doc .= '<'.$url.'>'; } }
The timing is thus not equal to xhtml renderer, but depends on render flags given to the p_get_metadata()
and the cache status. The logic here is to guarantee the metadata renderer is running when needed, but not unnecessary. Read more about render flags in functions to Get and Set Metadata below.
Metadata and Plugins
There are two ways for plugins to interact with metadata rendering:
- Syntax Plugins can create metadata for the rendered page with their
render()
method by handling the$format=="metadata"
. The current metadata can be accessed and modified in therenderer->meta
array and persistent values are in therenderer->persistent
array, when persistent metadata is modified the copy of it in the current metadata should be modified, too. - Action Plugins can register for the PARSER_METADATA_RENDER method to inspect or modify metadata before or after metadata rendering.
Persistent metadata can also be set at any time using the p_set_metadata
function that is described below, current metadata should only be set in the context of the renderer as it will be overwritten the next time metadata is rendered.
Metadata can be retrieved using the p_get_metadata
function that is described below. Plugins can also add metadata to the metadata index and search the indexed metadata. This is used in the tag plugin.
Note that persistent metadata is never cleaned and always used as basis for the current metadata so when switching from persistent to non-persistent metadata in a plugin make sure you implement a cleanup routine which removes persistent metadata from your plugin whenever it exists. For this reason non-persistent metadata should also be preferred whenever possible.
If you want to make sure that your plugin's metadata doesn't interfere with other plugins or DokuWiki itself consider using plugin_$plugin
as prefix/top level key (especially for persistent metadata, current metadata that fits in the Dublin Core element set should be stored as outlined above).
As it is very difficult to cleanly update persistent metadata properties that are arrays from various places (in most cases you don't know which is old metadata that should be cleaned up and which is metadata from other plugins that should be kept - or not because the plugin was disabled) consider using keys that are unique to your plugin for this case and merge them manually into the current metadata using the PARSER_METADATA_RENDER event, that way you can for example store custom tags in the persistent metadata and add them to the subject
metadata. Then your plugin's metadata also won't be used anymore when your plugin is disabled.
Functions to Get and Set Metadata
There are two functions in inc/parserutils.php
to deal with metadata:
p_get_metadata($id, $key, $render)
returns a metadata value for a page.$id
is the ID of a wiki page; required$key
the name of the metadata item to be retrieved. Defaults to false. If empty, an array of all the metadata items is returned. For retrieving items that are stored in sub-arrays, separate the keys of the different levels by spaces likerelation references
for the data stored in$meta['relation']['references']
in the renderer.$render
int, the parameter determines if the page metadata should be generated by the renderer when the metadata cache indicates that it shouldn't be used andp_get_metadata
isn't called from withinp_get_metadata
. There are several possibilities:METADATA_DONT_RENDER
means the metadata won't be generated/updated on request, use this when you request metadata for a lot of pages in a row asp_get_metadata
can trigger the parsing and rendering of the requested page.METADATA_RENDER_USING_CACHE
is the default, it uses the standard DokuWiki caching system, the behavior can be changed using the PARSER_CACHE_USE event. Below you can find more details on metadata and caching.METADATA_RENDER_SIMPLE_CACHE
means a lot simpler caching will be used, it only considers the modification time of the page and can't be changed using plugins. Use this when you request very simple properties of the page like its title.METADATA_RENDER_UNLIMITED
means that metadata for an unlimited number of pages should be rendered. Normally onlyP_GET_METADATA_RENDER_LIMIT
(default: 5) pages are rendered for metadata in one request. This should be used in locations like the cli indexer where time doesn't really matter but metadata should always be fresh. This option can be combined with the previous two options using logical or.false
is interpreted asMETADATA_DONT_RENDER
(this parameter used to be a boolean before the 2011-05-25 release)true
is interpreted asMETADATA_RENDER_USING_CACHE
p_set_metadata($id, $data, $render, $persistent)
sets some properties in the metadata, uses the metadata inside the renderer when there is a renderer for the specified page.$id
is the ID of a wiki page; required$data
is an array withkey => value
pairs to be set in the metadata, required. Note that here the keys are only keys for the top level. If the key isdescription
,data
orcontributor
the value is expected to be an array and merged with the existing data, if the key isrelation
, all sub-keys will be merged when there is existing array data for them. Other keys are not merged as array, but just stored as value, which will overwrite eventually subkeys.$render
boolean, whether or not the page metadata should be generated with the renderer before the metadata is set; optional, default is false$persistent
a boolean which indicates whether or not the particular metadata value will persist through the next metadata rendering. The default value is true.
Metadata and caching
In general, metadata is rendered on demand when p_get_metadata
is called. This happens normally right after the redirect after saving a page but also from time to time when the cache expires or is expired by a plugin using the PARSER_CACHE_USE event or when caching has been disabled in the renderer (but at most once in every request). In the cache file itself only a timestamp is stored. The timestamp is always updated when metadata is rendered, the .meta
file only when the metadata was actually changed (the xhtml
cache depends on it, that way it is only updated when really needed).
When metadata is requested inside the cache handler the old metadata is returned, that way you can compare new data to the old stored metadata in order to decide whether to use the cache or not. In the xhtml cache handler you get the new metadata but as the xhtml cache depends on the metadata whenever you change the metadata the xhtml will be updated.
In versions prior to 2011, metadata was only rendered when the xhtml was rendered. Back then you got the old metadata in the xhtml cache handler, plugins that still rely on this need to be updated.
Metadata index
Since the 2011-05-25 (“Rincewind”) release there is an index where metadata properties can be stored. It is organized in a similar manner as the fulltextindex and uses the same page list but different word indexes for each indexed metadata property, they are named $metaname_w.idx
, $metaname_i.idx
and $metaname_p.idx
. In DokuWiki itself currently the properties relation_references
and title
are indexed. Plugins can add their own metadata keys and it is also possible to add arbitrary data to the index. This can be done with the INDEXER_PAGE_ADD event. Plugins need to make sure they add themselves to the indexer version using the INDEXER_VERSION_GET event, the index of a page is re-created when this version is different from the version with which it has been indexed before. All metadata indexes are recorded in the metadata.idx
index so deleted pages can be removed from all metadata indexes.
The data is updated right after the fulltextindex so it can be regenerated in the same way, when a plugin wants to force an update of the index of a certain page it can delete the .indexed
meta file of that page (the index is not automatically updated when metadata is changed but only when the page itself is changed).
The indexer object (which can be obtained by using idx_get_indexer
) supports the following methods for metadata:
addMetaKeys($page, $key, $value=null)
- adds one or more metadata entries to a page (normally this should be done using INDEXER_PAGE_ADD but if plugins want to update the index explicitly and immediately this function can be used)lookupKey($key, &$value, $func=null)
- for looking up all pages where a certain metadata key has the specified value. It is possible to pass multiple keys as array, then an array with matches for each key is returned. Additionally with the$func
parameter it is possible to pass a comparison function likepreg_match
.getPages($key=null)
- if the $key parameter is set only pages where the metadata key$key
is set to at least one value are returned.
Example for getting the ids of all pages that link to a certain page:
$result = idx_get_indexer()->lookupKey('relation_references', $id);
(note that this functionality including an ACL check is available as ft_backlinks($id)
).
For more advanced queries (like getting all values stored for a certain metadata property) can be needed to access the index files directly using idx_getIndex
, feel free to suggest additional features for the metadata index in the bug tracker.
The tag plugin uses the metadata index, in its helper part there are example of how the index can be queried, in its action part you can see how the index is written.