DokuWiki

It's better when it's simple

Outils pour utilisateurs

Outils du site


fr:devel:metadata

FIXME : page à traduire

Stockage des métadonnées

Si une page du wiki contient des données, comme la date de la dernière mise à jour, la personne qui a fait la mise à jour, la taille des fichiers, etc., toutes ces informations peuvent être identifiées comme des métadonnées de la page du wiki. Cette page décrit où et comment de telles données additionnelles sont stockées dans DokuWiki.

Les métadonnées peuvent également être utilisées par les plugins dans différents buts, autres que de stocker les métadonnées évidentes pour la page, comme être utilisées pour déterminer quand le cache peut être utilisé, ou pour paramétrer quelles fonctionnalités du plugin doivent être activées sur une page.

Stockage

DokuWiki ne stocke pas toutes les métadonnées au même endroit (comme une base de données ou un registre). Les métadonnées peuvent être de façon basique les propres caractéristiques du fichier de données (par exemple la taille du fichier, la date de dernière modification), les autres métadonnées sont conservées par DokuWiki dans le répertoire meta. Les métadonnées se trouvent dans le fichier .meta correspondant au nom de la page du wiki. Il existe aussi un index dans lequel les métadonnées sélectionnées peuvent être recherchées.

Lecteur de métadonnées

Les informations contenues dans le répertoire meta sont initialement écrites par le lecteur de métadonnées. Il créée un fichier parallèle pour chaque page nommé <pageid>.meta dans le répertoire meta. Le fichier est un tableau PHP multidimensionnel sérialisé dont les clés suivent les normes Dublin Core.

Structure des données

Actuellement, les métadonnées suivantes sont sauvegardées par le noyau du lecteur de métadonnées:

  • title – chaîne de caractères, premier titre
  • creator – chaîne de caractères, nom complet du l'utilisateur qui a créé la page
  • user – chaîne de caractères, l'identifiant de conexion de l'utilisateur qui a créé la page
  • description – table
    • abstract – résumé en texte brut de la page (250 à 500 caractères)
    • tableofcontents – table, liste des tables avec identifiant du titre ('hid'), titre ('title'), type des éléments de la liste ('type') et niveau du titre ('level')
  • contributor table, liste des identifiants des utilisateurs ⇒ noms complets des utilisateurs qui ont contribué à la page
  • date – table
    • created – horodatage, date de création
    • modified– horodatage, date de dernière modification non mineure
    • valid
      • age – secondes, période en secondes au-delà de laquelle la page doit être raffraîchie (utilisé uniquement par syntaxe 'rss')
  • last_change – table, la dernière entrée du changelog
    • date – horodatage, date de dernière modification
    • ip – ip de l'utilisateur éditant la page
    • type – type d'édition (C création, E modification, e modification mineure, D suppression, R retour à une version antérieure)
    • id – identifiant de la page
    • user – nom d'utilisateur éditant la page
    • sum – résumé de l'éditeur
    • extra – donnée additionnelle, utilisées pour stocker la révision (horodatatge) dans le cas d'un retour à une version antérieure
  • relation – table
    • isreferencedby – table, liste des pages qui dirigent vers cette page: ID ⇒ boolean exists, n'est pas utilisé ou écrit par le noyau Dokuwiki
    • references – table, liste des pages vers lesquelles il existe des liens: page ID ⇒ boolean exists
    • media – table, lsite des fichier média vers lesquels il existe des liens: media ID ⇒ boolean exists
    • firstimage – identifiant ou url of de la première image de la page
    • haspart – table, liste des flux rss inclus (voir plus loin pour en savoir plus)
  • internal – table
    • cache – boolean, si le cache doit être utilisé
    • toc – boolean, si la table des matière doit être affichée

En outre, les plugins peuvent supporter d'autres métadonnées. Dont actuellement utilisées:

  • relation – table
    • haspart – table, liste des pages inclues: ID ⇒ boolean exists (include plugin) or flux rss
    • odt – table, liste des propriétés du plugin ODT
      • template – media id of ODT-file used as template
  • subject – array, lists of tags (tag plugin, blogtng plugin, flattr plugin); this is used by feed.php, if present
  • type – string, 'draft' for drafts (blog plugin)
  • geo – array, list of geographic tags (geotag, openlayersmap, socialcards and spatialhelper plugins)
    • lat – number, latitude of this location in decimal degrees
    • lon – number, longitude of this location in decimal degrees
    • alt – number, altitude in meter above sea level
    • region – string, region of this location, eg. a province or state
    • country – string, the country of this location
    • placename – string, placename describing this location or area
    • geohash – string, geohash of this location

It's recommended to use keys from the Dublin Core element set for any metadata that might be interesting for external use.

For plugin internal data it is recommended to store your keys under the plugin key:

  • plugin – array, contains keys for all plugins storing metadata
    • yourplugin – array, the keys you need for your plugin

This data is stored in an associative array with two keys: 'current' for all current data (including persistent one), 'persistent' for data that shall be kept over metadata rendering.

Metadata Persistence

Internally DokuWiki maintains two arrays of metadata, current & persistent. The persistent array holds duplicates of those key/values which should not be cleared during the rendering process. All requests for metadata values using p_get_metadata() are met using the current array.

Examples of persistent metadata keys are:

  • 'creator'
  • 'contributor'

Running of metadata rendering

The metadata rendering is only started by the p_get_metadata() and p_set_metadata(). This differs from the xhtml renderer. The wikipage parsing process has two stages: generation of the instructions by the Handler and next the generation of xhtml output with these instructions as input. As all Renderers the metadata renderer uses the same instructions as input. In the metadata renderer the metadata can directly be accessed at renderer->meta and renderer->persistent. Some examples and bit of explanation can be found at syntax plugins development documentation.

The metadata renderer creates also an short raw text abstract. The abstract is created from the rendered instruction by adding compact text without html to $this->doc. Use the $this->capture to check whether the renderer still collects text for the abstract.

// capture only the first few sections. 
// Is switched off as well by eg. section metarenderer
if ($this->capture){ 
    if($linktitle) {
        $this->doc .= $linktitle;
    } else {
        $this->doc .= '<'.$url.'>';
    }
}

The timing is thus not equal to xhtml renderer, but depends on render flags given to the p_get_metadata() and the cache status. The logic here is to guarantee the metadata renderer is running when needed, but not unnecessary. Read more about render flags in functions to Get and Set Metadata below.

Metadata and Plugins

There are two ways for plugins to interact with metadata rendering:

  • Syntax Plugins can create metadata for the rendered page with their render() method by handling the $format=="metadata". The current metadata can be accessed and modified in the renderer->meta array and persistent values are in the renderer->persistent array, when persistent metadata is modified the copy of it in the current metadata should be modified, too.
  • Action Plugins can register for the PARSER_METADATA_RENDER method to inspect or modify metadata before or after metadata rendering.

Persistent metadata can also be set at any time using the p_set_metadata function that is described below, current metadata should only be set in the context of the renderer as it will be overwritten the next time metadata is rendered.

Metadata can be retrieved using the p_get_metadata function that is described below. Plugins can also add metadata to the metadata index and search the indexed metadata. This is used in the tag plugin.

Note that persistent metadata is never cleaned and always used as basis for the current metadata so when switching from persistent to non-persistent metadata in a plugin make sure you implement a cleanup routine which removes persistent metadata from your plugin whenever it exists. For this reason non-persistent metadata should also be preferred whenever possible.

If you want to make sure that your plugin's metadata doesn't interfere with other plugins or DokuWiki itself consider using plugin_$plugin as prefix/top level key (especially for persistent metadata, current metadata that fits in the Dublin Core element set should be stored as outlined above).

As it is very difficult to cleanly update persistent metadata properties that are arrays from various places (in most cases you don't know which is old metadata that should be cleaned up and which is metadata from other plugins that should be kept - or not because the plugin was disabled) consider using keys that are unique to your plugin for this case and merge them manually into the current metadata using the PARSER_METADATA_RENDER event, that way you can for example store custom tags in the persistent metadata and add them to the subject metadata. Then your plugin's metadata also won't be used anymore when your plugin is disabled.

Functions to Get and Set Metadata

There are two functions in inc/parserutils.php to deal with metadata:

  • p_get_metadata($id, $key, $render) returns a metadata value for a page.
    • $id is the ID of a wiki page; required
    • $key the name of the metadata item to be retrieved. Defaults to false. If empty, an array of all the metadata items is returned. For retrieving items that are stored in sub-arrays, separate the keys of the different levels by spaces like relation references for the data stored in $meta['relation']['references'] in the renderer.
    • $render int, the parameter determines if the page metadata should be generated by the renderer when the metadata cache indicates that it shouldn't be used and p_get_metadata isn't called from within p_get_metadata. There are several possibilities:
      • METADATA_DONT_RENDER means the metadata won't be generated/updated on request, use this when you request metadata for a lot of pages in a row as p_get_metadata can trigger the parsing and rendering of the requested page.
      • METADATA_RENDER_USING_CACHE is the default, it uses the standard DokuWiki caching system, the behavior can be changed using the PARSER_CACHE_USE event. Below you can find more details on metadata and caching.
      • METADATA_RENDER_SIMPLE_CACHE means a lot simpler caching will be used, it only considers the modification time of the page and can't be changed using plugins. Use this when you request very simple properties of the page like its title.
      • METADATA_RENDER_UNLIMITED means that metadata for an unlimited number of pages should be rendered. Normally only P_GET_METADATA_RENDER_LIMIT (default: 5) pages are rendered for metadata in one request. This should be used in locations like the cli indexer where time doesn't really matter but metadata should always be fresh. This option can be combined with the previous two options using logical or.
      • false is interpreted as METADATA_DONT_RENDER (this parameter used to be a boolean before the 2011-05-25 release)
      • true is interpreted as METADATA_RENDER_USING_CACHE
  • p_set_metadata($id, $data, $render, $persistent) sets some properties in the metadata, uses the metadata inside the renderer when there is a renderer for the specified page.
    • $id is the ID of a wiki page; required
    • $data is an array with key => value pairs to be set in the metadata, required. Note that here the keys are only keys for the top level. If the key is description, data or contributor the value is expected to be an array and merged with the existing data, if the key is relation, all sub-keys will be merged when there is existing array data for them. Other keys are not merged as array, but just stored as value, which will overwrite eventually subkeys.
    • $render boolean, whether or not the page metadata should be generated with the renderer before the metadata is set; optional, default is false
    • $persistent a boolean which indicates whether or not the particular metadata value will persist through the next metadata rendering. The default value is true.

Metadata and caching

In general, metadata is rendered on demand when p_get_metadata is called. This happens normally right after the redirect after saving a page but also from time to time when the cache expires or is expired by a plugin using the PARSER_CACHE_USE event or when caching has been disabled in the renderer (but at most once in every request). In the cache file itself only a timestamp is stored. The timestamp is always updated when metadata is rendered, the .meta file only when the metadata was actually changed (the xhtml cache depends on it, that way it is only updated when really needed).

When metadata is requested inside the cache handler the old metadata is returned, that way you can compare new data to the old stored metadata in order to decide whether to use the cache or not. In the xhtml cache handler you get the new metadata but as the xhtml cache depends on the metadata whenever you change the metadata the xhtml will be updated.

In versions prior to 2011, metadata was only rendered when the xhtml was rendered. Back then you got the old metadata in the xhtml cache handler, plugins that still rely on this need to be updated.

Metadata index

Since the 2011-05-25 (“Rincewind”) release there is an index where metadata properties can be stored. It is organized in a similar manner as the fulltextindex and uses the same page list but different word indexes for each indexed metadata property, they are named $metaname_w.idx, $metaname_i.idx and $metaname_p.idx. In DokuWiki itself currently the properties relation_references and title are indexed. Plugins can add their own metadata keys and it is also possible to add arbitrary data to the index. This can be done with the INDEXER_PAGE_ADD event. Plugins need to make sure they add themselves to the indexer version using the INDEXER_VERSION_GET event, the index of a page is re-created when this version is different from the version with which it has been indexed before. All metadata indexes are recorded in the metadata.idx index so deleted pages can be removed from all metadata indexes.

The data is updated right after the fulltextindex so it can be regenerated in the same way, when a plugin wants to force an update of the index of a certain page it can delete the .indexed meta file of that page (the index is not automatically updated when metadata is changed but only when the page itself is changed).

The indexer object (which can be obtained by using idx_get_indexer) supports the following methods for metadata:

  • addMetaKeys($page, $key, $value=null) - adds one or more metadata entries to a page (normally this should be done using INDEXER_PAGE_ADD but if plugins want to update the index explicitly and immediately this function can be used)
  • lookupKey($key, &$value, $func=null) - for looking up all pages where a certain metadata key has the specified value. It is possible to pass multiple keys as array, then an array with matches for each key is returned. Additionally with the $func parameter it is possible to pass a comparison function like preg_match.
  • getPages($key=null) - if the $key parameter is set only pages where the metadata key $key is set to at least one value are returned.

Example for getting the ids of all pages that link to a certain page:

$result = idx_get_indexer()->lookupKey('relation_references', $id);

(note that this functionality including an ACL check is available as ft_backlinks($id)).

For more advanced queries (like getting all values stored for a certain metadata property) can be needed to access the index files directly using idx_getIndex, feel free to suggest additional features for the metadata index in the bug tracker.

The tag plugin uses the metadata index, in its helper part there are example of how the index can be queried, in its action part you can see how the index is written.

fr/devel/metadata.txt · Dernière modification : 2019-06-08 09:14 de Digitalin

Sauf mention contraire, le contenu de ce wiki est placé sous les termes de la licence suivante : CC Attribution-Share Alike 4.0 International
CC Attribution-Share Alike 4.0 International Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki