Table of Contents

Scalability

NEEDSATTENTION NOT EVEN A DRAFT YET

This page is for notes and thoughts on the scalability of DokuWiki with particular reference to dispelling assumptions that a DBMS is the ideal repository for wiki information. That isn't to say a DBMS is the wrong choice, just to say that its not the only choice and that using the file system in the way that DokuWiki does is in no way a bad choice for a wiki.

Reminder: characteristics of wiki data:

A file system is an extremely efficient place for storing data. That's where a DBMS is likely to put its data. For simple retrieval a file system should beat the pants off a DBMS. It's quite likely a well constructed DBMS application will cache the results of its common queries in the file system for faster retrieval!

Much of a DokuWiki's work is simple retrieval - get a file and show its contents possibly with some processing. This is very well suited to a file system. Databases excel when dealing with highly structured data and also where small pieces of that data are updated frequently. Wiki's aren't like that.

Searching a wiki or more properly, indexing a wiki to enable quick searching is an area where DBMS technology can assist a wiki. DokuWiki implements its own indexing by creating a list of every “word” of three or more letters in the wiki. Partial searching is slower than it would be with a DB and it is more difficult to handle “advanced searches”, however for whole word searching it is a viable solution (i.e. the results are presented in a reasonable time when compared to other activities on the wiki). For a public wiki its arguable that Google provides a better indexing service than any internal search could.

A DBMS can make it easier to cross the single server threshold. At its simplest, the DBMS is placed on one box and the web server / application component on another. It's arguable that DokuWiki could accomplish something similar by using NFS mounts to allow multiple locally remote (i.e. LAN connected) servers to access the wiki data directory tree. I am not aware of any DokuWiki installation that has implemented anything similar.

For very large installations where the data will have to reside on multiple servers, instead of solving replication/synchronisation issues the application can hand them off to a suitable DBMS.

Experiences

I'd like to get some experiences with very large installations. Maybe from people who converted existing documents to DokuWiki. Please indicate the DokuWiki version you're using, as DokuWiki continually strives to improve its performance.

What does find ./data -name '*.txt' | wc -l say?

Note To give People some idea how to rate your experiences, pure Size is not enough. Please post some more information about your environment: Which Version of DokuWiki are your using, what kind of Hardware is DokuWiki running on, is it a shared or dedicated server?

Links

Discussion