DokuWiki

It's better when it's simple

User Tools

Site Tools


Sidebar

Translations of this page?:

Learn about DokuWiki

Advanced Use

Corporate Use

Our Community


Follow us on Facebook, Twitter and other social networks.

tips:logging

Statistics in Dokuwiki

The code on this page describes how Dokuwiki could be extended with a log. This might be helpful if the web server's log files are out of range and some statistics about Dokuwiki's usage is needed.

I will describe two versions. The first is the high end solution and will log access to wiki pages and also to the internal and external media files. The second one is for “beginners” and will only log wiki pages.

:!: [note by J.-F. Lalande] Using the information and code provided on this page, I created the logstats plugin that generate an entry in access.log for each access of a page of dokuwiki. You can see details and download on my logstat plugin page.

Log File Format

Both solutions will use the NCSA combined or NCSA extended log file format. This log file format is very popular and often used on web servers like apache. Many report generators can read this format and create nice reports from it. Because of this fantastic support by external programs Dokuwiki doesn't need any built-in reporting functionality itself.

Report generators that could be used are (only some examples, list far from complete):

  • AWStats - A nice report generator from France. Notes on customising for DokuWiki in Appendix B below
  • Webalizer - Another famous report generator

The log file format consists of several fields concatenated to a single line:

<host> <rfc931> <user> [<timestamp>] "<request>" <error> <filesize> "<referer>" "<agent>"
  • <host> - IP of the client host (we don't do reverse host look-ups)
  • <rfc931> - remote user identification or '-' if not available
  • <user> - user id of authenticated user or '-' if not available
  • <timestamp> - time in format [01/Dec/2005:22:19:12 +0200]
  • <request> - Requested protocol, for eg. GET or POST, requested page and protocol
  • <error> - error code from server, for eg. 200 (OK) or 404 (file not found)
  • < filesize > - size of the wiki page (only the bare text)
  • <referer> - page from which the user come from. This information is very client dependent and not always available. The logging function does it's best to fill in useful information here.
  • <agent> - identifying information that the client browser reports about itself

Ultimate Statistics

Ultimate statistics will log wiki pages, internal and external media files. The main part of the code was placed in logfile.php. This file is roundabout 4 KByte big and because I'm not allowed to upload files in this wiki, I added the source code at the end of this page (See Appendix A).

Let's see what we need to get ultimate statistics work

1. As first step copy the file logfile.php to /inc

2. As second step we need to tell Dokuwiki that it should take care about the log file. This is done in /inc/init.php function init_paths(). This function evaluates some path names and saves them in the $conf[] array. Change the function init_paths() so that it looks like this way:

function init_paths(){
    global $conf;
 
    $paths = array('datadir'   => 'pages',
            'olddir'    => 'attic',
            'mediadir'  => 'media',
            'metadir'   => 'meta',
            'cachedir'  => 'cache',
            'indexdir'  => 'index',
            'lockdir'   => 'locks',
            'tmpdir'    => 'tmp',
            'accesslog' => 'access.log');
 
    foreach($paths as $c => $p){
        if(empty($conf[$c]))  $conf[$c] = $conf['savedir'].'/'.$p;
        $conf[$c]             = init_path($conf[$c]);
        if($c != 'accesslog' && empty($conf[$c]))  nice_die("The $c ('$p') does not exist, isn't accessible or writable.
                You should check your config and permission settings.
                Or maybe you want to <a href=\"install.php\">run the
                installer</a>?");
    }
 
[...]

Dokuwiki will check now if the logfile access.log exists and only in this case the path is saved in $conf['accesslog'].

3. The next step is to add a function in inc/template.php to keep the template API consistent. But first we have to include inc/logfile.php.

if(!defined('DOKU_INC')) die('meh.');
require_once(DOKU_INC.'inc/logfile.php');
 
[...]
 
/**
 * log this page to a log file
 *
 * @author Matthias Grimm <matthiasgrimm@users.sourceforge.net>
 *
 */
function tpl_logfile(){
    global $ID;
 
    logPageAccess(cleanID($ID));
}

4. Update your favorite template to call tpl_logfile(). The best place is the line just after the call to tpl_indexerWebBug(). Most likely this is located at the end of the file main.php of your template:

[...]
 
</div>
<div class="no"><?php /* provide DokuWiki housekeeping */ tpl_indexerWebBug()?></div>
<div class="no"><?php /* do the logging stuff */ tpl_logfile()?></div>  <- ADD THIS LINE
</body>
</html>

5. To be able to log media files too we need to modify /lib/exe/fetch.php. First we need to include inc/logfile.php again. Add the include statement below all the others already in lib/exe/fetch.php. The function logMediaAccess() creates the log entry and need to be inserted after the media file source has been checked. A good place is near line 60 the line with the comment “//check file existance”, just before the existence check (see comments below).

  require_once(DOKU_INC.'inc/logfile.php');
 
  [...]
 
  //log media access
  logMediaAccess($MEDIA, $FILE);
 
  //check file existence

6. That's it. Logging should work now. The only step left is to create a empty log file /data/access.log. The log file routines will only write in an already existing log file. If the file /data/access.log does not exist, nothing will be done.

Basic Statistics

Basic statistics means only access to a wiki page will trigger a log file entry. If you need logging of media files too, please read the chapter Ultimate Statistics.

Only a few steps are necessary to get basic statistics running. The first step will be to tell DokuWiki where the log file is or should be. Add following line to your local.php:

$conf['logfile'] = './data/access.log';  //location of log file

As second step please add following function to inc/template.php:

/**
 * This function writes access information of the current page to a log
 * file. It uses the combined log file format that is also used by the
 * apache web server. A whole bunch of available log analysers could be
 * used to visualize the log.
 *
 * @author Matthias Grimm <matthias.grimm@users.sourceforge.net>
 */
function tpl_logfile(){
    global $ID;
    global $conf;
 
    $exists = false;
    $page = cleanID($ID);
 
    resolve_pageid('', $page, $exists);
    $page = str_replace(':','/',$page);
    $page = utf8_encodeFN($page);
 
    $host      = $_SERVER['REMOTE_ADDR'];
    $user      = isset($_SERVER['REMOTE_USER']) ? $_SERVER['REMOTE_USER'] : "-";
    $timestamp = date("[d/M/Y:H:i:s O]");
    $method    = isset($_SERVER['REQUEST_METHOD'])  ? $_SERVER['REQUEST_METHOD']  : "";
    $protocol  = isset($_SERVER['SERVER_PROTOCOL']) ? $_SERVER['SERVER_PROTOCOL'] : "";
    $filesize  = @filesize(wikiFN($ID));
    $status    = $exists ? "200 $filesize" : "404 0";
    $agent     = isset($_SERVER['HTTP_USER_AGENT']) ? $_SERVER['HTTP_USER_AGENT'] : "";
    $referer   = $_SERVER['PHP_SELF'];
 
    $logline = "$host - $user $timestamp \"$method $page $protocol\" $status \"$referer\" \"$agent\"\n";
    io_saveFile($conf['logfile'], $logline, true);
}

As third and last step add the following line to main.php of your favorite template. The line just behind the function call to the indexer would be fine:

<?php tpl_logfile() ?>

That's it. Surf a while in your wiki and afterwards have a look at the log file. It should contain a line for each called page.

Appendix A

Source code of logfile.php

logfile.php
<?php
/**
 * DokuWiki logging functions
 *
 * @license    GPL 2 (http://www.gnu.org/licenses/gpl.html)
 * @author     Matthias Grimm <matthiasgrimm@users.sourceforge.net>
 */
 
  if(!defined('DOKU_INC')) define('DOKU_INC',realpath(dirname(__FILE__).'/../').'/');
  require_once(DOKU_CONF.'dokuwiki.php');
 
/**
 * beautify a wiki page id for the log
 *
 * The wiki page id will be transformed to a filename like string
 * utf8 codes will be encoded.
 *
 * @param  $id  wiki page id
 *
 * @author Matthias Grimm <matthiasgrimm@users.sourceforge.net>
 */
function prepareID($path){
    $path = cleanID($path);
    $path = str_replace(':','/',$path);
    $path = utf8_encodeFN($path);
    return $path;
}
 
/**
 * checks if a file exists and returns an appropriate web
 * server status
 *
 * @param  $file  complete filepath to check
 *
 * @author Matthias Grimm <matthiasgrimm@users.sourceforge.net>
 */
function getStatus($file){
    if(@file_exists($file)){
      $size = @filesize($file);
      return "200 $size";
    }else
      return "404 0";
}
 
/**
 * logs access to a wiki page
 *
 * @param  $id  page id of the wiki page including namespace
 *
 * @author Matthias Grimm <matthiasgrimm@users.sourceforge.net>
 */ 
function logPageAccess($id){
    global $ACT;
 
    if ($ACT == 'show'){
      $page = prepareID($id);
 
      $crumbs = breadcrumbs();          // get last visited pages
      $crumbs = array_keys($crumbs);   // get raw page IDs 
      array_pop($crumbs);             // skip current page
      $referer = array_pop($crumbs); // get current page's predecessor
      $referer = ($referer) ? prepareID($referer) : '';
 
      logAccess($page,getStatus(wikiFN($id)),$referer);
    }
}
 
/**
 * logs access to a media file (internally or externally)
 *
 * @param  $media   url or dokuwiki path of media
 * @param  $file    full path to the media file
 *
 * @author Matthias Grimm <matthiasgrimm@users.sourceforge.net>
 */
function logMediaAccess($media,$file){
    if(!preg_match('#^(https?|ftp)://#i',$media))
      $media = prepareID($media);
 
    logAccess($media,getStatus($file));
}
 
/**
 * creates a log file entry and writes it to the log
 *
 * This function writes access information of the current page to a log
 * file. It uses the combined log file format that is also used by the
 * apache web server. A whole bunch of available log analysers could be
 * used to visualize the log.
 *
 * @param  $page     page name that was called
 * @param  $status   HTTP status code followed by the file size
 * @param  $referer  predecessor of $page (which page link to $page)
 *                   Is this field empty, the functions tries to get
 *                   the referer from the web server (HTTP_REFERER)
 *
 * @author Matthias Grimm <matthias.grimm@users.sourceforge.net>
 *
 * combined log file format:
 *     <host> <rfc931> <user> [<timestamp>] "<request>" <error> <filesize>
 *               "<referer>" "<agent>"\n
 *
 * <host>      IP of the client host (we don't do reverse host lookups)
 * <rfc931>    remote user identification or '-' if not available
 * <user>      user id or '-' if not available
 * <timestamp> time in format [01/Dec/2005:22:19:12 +0200]
 * <request>   Requested protocol, for eg. GET or POST, requested page
 *             and protocol
 * <error>     error code from server, for eg. 200 (OK) or 404 (file
 *             not found)
 * <filesize>  size of the wiki page (only the bare text)
 * <referer>   page that called this one. We don't have this information
 *             and filled the dokuwiki script name in.
 * <agent>     identifying information that the client browser reports
 *             about itself
 */
function logAccess($page,$status,$referer=''){
    global $conf;
 
    if (!empty($conf['accesslog'])){
      $host      = $_SERVER['REMOTE_ADDR'];
      $user      = isset($_SERVER['REMOTE_USER']) ? $_SERVER['REMOTE_USER'] : "-";
      $timestamp = date("[d/M/Y:H:i:s O]");
      $method    = isset($_SERVER['REQUEST_METHOD'])  ? $_SERVER['REQUEST_METHOD']  : "";
      $protocol  = isset($_SERVER['SERVER_PROTOCOL']) ? $_SERVER['SERVER_PROTOCOL'] : "";
      $agent     = isset($_SERVER['HTTP_USER_AGENT']) ? $_SERVER['HTTP_USER_AGENT'] : "";
 
      if (empty($referer)){
        if(isset($_SERVER['HTTP_REFERER'])){
          $cnt = preg_match('/\?id=((\w+\:*)+)/i',$_SERVER['HTTP_REFERER'], $match);
          if($cnt == 1)
            $referer = prepareID($match[1]);
        }
      }
 
      $logline = "$host - $user $timestamp \"$method $page $protocol\" $status \"$referer\" \"$agent\"\n";
      io_saveFile($conf['accesslog'], $logline, true);
    }
}
 
//Setup VIM: ex: et ts=2 enc=utf-8 :

Appendix B - AWStats configuration

If you choose to use AWStats to process your logs, then here are some tips on configuration options that give you some more control over how DokuWiki accesses are shown.

  • By default, AWStats does not keep track of the parameters after a “?” in the URL. So all your wiki accesses will appear as hits on the one page “doku.php”. If you want to see which pages in the wiki are being accessed then you can enable tracking of parameters.
URLWithQuery=1 # Set this to "1" to enable tracking of URL parameters
URLWithQueryWithOnlyFollowingParameters="id media" # Use this to limit which parameters you are interested in.
  • AWStats lets you produce customised reports. Here are 3 additional custom reports that show the top 10:
    • wiki pages accessed in descending order of popularity
    • wiki media files downloaded in descending order of popularity
    • searches performed using the wiki search box

NOTE: You will have to edit the following lines according to your installation:

  • ExtraSectionCondition* - put the URL path to the doku.php and fetch.php file
  • ExtraSectionFirstColumnFormat* - as above
  • MaxNbOfExtra* - set the number of rows you want in the report

NOTE: The results have the following limitations:

  • DokuWiki sometimes switched to the POST method rather than the GET method. The log files do not contain the parameters to POST requests and will therefore not be counted in these reports.
  • The “id=” parameter is overloaded - it does not always represent a request for a particular page.
  • If a user performs a search without hitting <RETURN> or clicking on “Search” then the search is all performed using AJAX and does not show in these reports.
ExtraSectionName1="Wiki Pages"
ExtraSectionCodeFilter1="200 304"
ExtraSectionCondition1="URL,\/doku.php"
ExtraSectionFirstColumnTitle1="Page"
ExtraSectionFirstColumnValues1="QUERY_STRING,id=([^&]+)"
ExtraSectionFirstColumnFormat1="<a href="/doku.php?id=%s">%s</a>"
ExtraSectionStatTypes1=PHBL
ExtraSectionAddAverageRow1=0
ExtraSectionAddSumRow1=1
MaxNbOfExtra1=10
MinHitExtra1=1

ExtraSectionName2="Wiki Media Downloads"
ExtraSectionCodeFilter2="200 304"
ExtraSectionCondition2="URL,\/fetch.php"
ExtraSectionFirstColumnTitle2="Document"
ExtraSectionFirstColumnValues2="QUERY_STRING,media=([^&]+)"
ExtraSectionFirstColumnFormat2="<a href="/lib/exe/fetch.php?media=%s">%s</a>"
ExtraSectionStatTypes2=PHBL
ExtraSectionAddAverageRow2=0
ExtraSectionAddSumRow2=1
MaxNbOfExtra1=10
MinHitExtra1=1

ExtraSectionName3="Wiki Searches"
ExtraSectionCodeFilter3="200 304"
ExtraSectionCondition3="QUERY_STRING,do=search&id=([^&]+)"
ExtraSectionFirstColumnTitle3="Search terms"
ExtraSectionFirstColumnValues3="QUERY_STRING,id=([^&]+)"
ExtraSectionFirstColumnFormat3="<a href="/doku.php?do=search&id=%s" target="_blank">%s</a>"
ExtraSectionStatTypes3=PHBL
ExtraSectionAddAverageRow3=0

Discussion

Is it possible to save searches in a log file?

I agree that that would be very helpful to be able to have in the log file rather than having to check the apache logs. Dopple 25/08/2009

tips/logging.txt · Last modified: 2016-02-09 08:16 by 193.140.71.8