====== Metadata Storage ====== If the content stored in a wiki page is //data//, things like the time of last update, who updated it, the filesize etc. could all be regarded as //metadata// for the wiki page. This page describes where and how such additional data is stored in DokuWiki. ===== Storage ===== DokuWiki does not store all metadata at central place (like a database or registry). Metadata can basically be the own datafile's properties (eg. filesize, last modified date), the other metadata are kept by DokuWiki within the ''meta'' directory. Metadata are found within the ''.meta'' file corresponding to the wiki page name. ===== Metadata Renderer ===== Info in the ''meta'' directory is initially written by the metadata renderer. It creates a parallel file for each page named ''$id.meta'' in the ''meta'' directory. The file is a serialized multi-dimensional PHP array whose keys follow the [[http://dublincore.org/|Dublin Core]] element names. The renderer gets not invoked when a page is saved but either when it is accessed and the metadata doesn't exist yet or when in the cache renderer (called in ''p_cached_output'' in ''p_wiki_xhtml'' in ''html_show'' in ''tpl_content'' in the template) it is determined that the cache is outdated or through the [[:indexer]] when the page was added through ways other than DokuWiki (eg. a script). In the current [[develonly|development version]] the rendering process is triggered differently. There is a new cache file for the metadata (that just contains a timestamp) and metadata is rendered when the cache logic determines that it is needed (similar to the xhtml renderer cache, i.e. disabling the xhtml cache also disables the metadata cache except that metadata for the current page is always rendered just one time per page load). This also means that the xhtml cache does not automatically triggers metadata rendering like in previous versions and plugin authors should make sure that when they use metadata properties to store e.g. the status of linked pages that they use the metadata renderer cache event for determining cache validity because in the xhtml renderer cache event they will get the already updated metadata while reading the metadata in the metadata cache handler will give them the old metadata. The xhtml cache itself depends on the metadata file so metadata rendering will trigger xhtml rendering automatically when the metadata has been changed (otherwise it isn't saved). ==== Functions to Get and Set Metadata ===== There are two functions in ''inc/parserutils.php'' to deal with metadata: * ''[[xref>p_get_metadata]]($id, $key, $render)'' returns a metadata value for a page. * ''$id'' is the ID of a wiki page; required * ''$key'' the name of the metadata item to be retrieved. Defaults to false. If empty, an array of all the metadata items is returned. * ''$render'' boolean, whether or not the page metadata should be generated by the renderer if no metadata exists; optional, default is false. [[develonly]]: The parameter determines if the page metadata should be generated by the renderer when the metadata cache indicates that it shouldn't be used and ''p_get_metadata'' isn't called from within ''p_get_metadata'', default is true. Set it to false when you request metadata for a lot of pages in a row as this function can trigger the parsing and rendering of the requested page. * ''[[xref>p_set_metadata]]($id, $data, $render, $persistent)'' sets some properties in the metadata. * ''$id'' is the ID of a wiki page; required * ''$data'' is an array with key => value pairs to be set in the metadata, required * ''$render'' boolean, whether or not the page metadata should be generated with the renderer; optional, default is false * ''$persistent'' a boolean which indicates whether or not the particular metadata value will persist through the next metadata rendering. The default value is true. ==== Data Structure ==== Currently, the following metadata is saved by the core metadata renderer: * 'title' -- string, first heading * 'creator' -- string, full name of the user who created the page * 'description' -- array * 'abstract' -- raw text abstract (250 to 500 chars) of the page * 'tableofcontents' -- array, list of header id ('hid'), title ('title'), list item type ('type') and header level ('level') * 'contributor' array, list of user ID => full name of users, who have contributed to the page * 'date' -- array * 'created' -- timestamp, creation date * 'modified'-- timestamp, date of last non-minor modification * 'valid' * 'age' -- seconds, period in seconds before the page should be refreshed (used by 'rss' syntax only) * 'last_change' -- array, the last changelog entry * 'date' -- timestamp, date of the last change * 'ip' -- ip of the user editing * 'type' -- type of the edit (C create, E edit, e minor edit, D delete, R revert) * 'id' -- id of the page * 'user' -- username of the user editing * 'sum' -- summary of the editor * 'extra' -- extra data, used for storing the revision (timestamp) in the case of a revert * 'relation' -- array * 'isreferencedby' -- array, list of pages that link to this page: ID => boolean exists, this is not used or written by DokuWiki core * 'references' -- array, list of linked pages: ID => boolean exists * 'firstimage' -- id or url of the first image in the page * 'haspart' -- array, list of included rss feeds (and more, see below) * 'internal' -- array * 'cache' -- boolean, if the cache may be used * 'toc' -- boolean, if the toc shall be displayed Additionally, plugins can support more metadata elements. Currently used: * 'relation' -- array * 'ispartof' -- array, list of pages that include the current page: ID => boolean exists ([[plugin:include]] plugin) * 'haspart' -- array, list of included pages: ID => boolean exists ([[plugin:include]] plugin) or rss feeds * 'subject' -- array, lists of tags ([[plugin:tag]] plugin, [[plugin:blogtng]] plugin, [[plugin:flattr]] plugin); this is used by ''feed.php'', if present * 'type' -- string, 'draft' for drafts ([[plugin:blog]] plugin) * 'geo' -- array, list of geographic tags ([[plugin:geotag]], [[plugin:openlayersmap]] plugin) It's recommended to use keys from the [[http://dublincore.org/documents/dces/|Dublin Core element set]] for any metadata that might be interesting for external use. This data is stored in an associative array with two keys: 'current' for all current data (including persistent one), 'persistent' for data that shall be kept over metadata rendering. ==== Metadata Persistence ==== Internally DokuWiki maintains two arrays of metadata, ''current'' & ''persistent''. The ''persistent'' array holds duplicates of those key/values which should not be cleared during the rendering process. All requests for metadata values using ''p_get_metadata()'' are met using the ''current'' array. Examples of persistent metadata keys are: * 'creator' * 'contributor' ==== Metadata and Plugins ==== In addition to the get and set metadata functions mentioned above, there are two other mechanisms plugins can use to interact with metadata. [[Syntax Plugins]] can create metadata for the //current// page with their ''render()'' method by handling the ''$format=="metadata"''. Metadata key/value pairs are added to the ''renderer%%->%%meta'' array and persistent values are also added to the ''renderer%%->%%persistent'' array. [[Action Plugins]] can register for the [[.event:parser_metadata_render|PARSER_METADATA_RENDER]] method to inspect or modify metadata before or after metadata rendering. ===== Metadata index ===== FIXME this should be further extended. Since the 2011-05-25 ("Rincewind") release there is an index where metadata properties can be stored. It is organized in a similar manner as the [[fulltextindex]] and uses the same page list but different word indexes for each indexed metadata property, they are named ''$metaname_w.idx'', ''$metaname_i.idx'' and ''$metaname_p.idx''. In DokuWiki itself currently the properties ''relation_references'' and ''title'' are indexed. Plugins can add their own metadata keys and it is also possible to add arbitrary data to the index. This can be done with the [[devel:event:INDEXER_PAGE_ADD]] event. Plugins need to make sure they add themselves to the indexer version using the [[devel:event:INDEXER_VERSION_GET]] event. All metadata indexes are recorded in the ''metadata.idx'' index so deleted pages can be removed from all metadata indexes. The indexer object (which can be obtained by using idx_get_indexer) supports the following methods for metadata: * **''addMetaKeys($page, $key, $value=null)''** - adds one or more metadata entries to a page (normally this should be done using [[devel:event:INDEXER_PAGE_ADD]] but if plugins want to update the index explicitly and immediately this function can be used) * **''lookupKey($key, &$value, $func=null)''** - for looking up all pages where a certain metadata key has the specified value. It is possible to pass multiple keys as array, then an array with matches for each key is returned. Additionally with the ''$func'' parameter it is possible to pass a comparison function like ''preg_match''. * **''getPages($key=null)''** - if the $key parameter is set only pages where the metadata key ''$key'' is set to at least one value are returned. Example for getting all backlinks of a certain page: $result = idx_get_indexer()->lookupKey('relation_references', $id); (note that this functionality and an ACL check is available as ''ft_backlinks($id)'').