DokuWiki

It's better when it's simple

User Tools

Site Tools


Sidebar

Translations of this page?:

Learn about DokuWiki

Advanced Use

Corporate Use

Our Community


Follow us on Facebook, Twitter and other social networks.

devel:scalability

Scalability

NEEDSATTENTION NOT EVEN A DRAFT YET

This page is for notes and thoughts on the scalability of DokuWiki with particular reference to dispelling assumptions that a DBMS is the ideal repository for wiki information. That isn't to say a DBMS is the wrong choice, just to say that its not the only choice and that using the file system in the way that DokuWiki does is in no way a bad choice for a wiki.

Reminder: characteristics of wiki data:

  • read often, updated less often
    • sounds like LDAP entry per page, attribudes for source text, meta…
  • mixed and unstructured content

A file system is an extremely efficient place for storing data. That's where a DBMS is likely to put its data. For simple retrieval a file system should beat the pants off a DBMS. It's quite likely a well constructed DBMS application will cache the results of its common queries in the file system for faster retrieval!

Much of a DokuWiki's work is simple retrieval - get a file and show its contents possibly with some processing. This is very well suited to a file system. Databases excel when dealing with highly structured data and also where small pieces of that data are updated frequently. Wiki's aren't like that.

Searching a wiki or more properly, indexing a wiki to enable quick searching is an area where DBMS technology can assist a wiki. DokuWiki implements its own indexing by creating a list of every “word” of three or more letters in the wiki. Partial searching is slower than it would be with a DB and it is more difficult to handle “advanced searches”, however for whole word searching it is a viable solution (i.e. the results are presented in a reasonable time when compared to other activities on the wiki). For a public wiki its arguable that Google provides a better indexing service than any internal search could.

A DBMS can make it easier to cross the single server threshold. At its simplest, the DBMS is placed on one box and the web server / application component on another. It's arguable that DokuWiki could accomplish something similar by using NFS mounts to allow multiple locally remote (i.e. LAN connected) servers to access the wiki data directory tree. I am not aware of any DokuWiki installation that has implemented anything similar.

For very large installations where the data will have to reside on multiple servers, instead of solving replication/synchronisation issues the application can hand them off to a suitable DBMS.

Experiences

I'd like to get some experiences with very large installations. Maybe from people who converted existing documents to DokuWiki. Please indicate the DokuWiki version you're using, as DokuWiki continually strives to improve its performance.

What does find ./data -name '*.txt' | wc -l say?

Note To give People some idea how to rate your experiences, pure Size is not enough. Please post some more information about your environment: Which Version of DokuWiki are your using, what kind of Hardware is DokuWiki running on, is it a shared or dedicated server?

  • ./data/*txt = 986 items, 6.6MB and ./media/* = 853, 227MB. (Update, 17/04/08: > Now up to 1893 pages and still going strong) Intranet installation of 55 users across three locations. Running on FreeBSD. Lovely system! Our users are continually discovering new ways to put it to work. At this size, it would be great to have a more granular search system (i.e., search whole wiki, or just a particular namespace). I love that DokuWiki is not database driven - it's great to be a able to manipulate the page content using standard UNIX tools. – Nick Fahey, 23 Aug 2007
  • 187 documents (1.3M) on the UbuntuUsers Wiki – no problems. – MANY pages. We switched to MoinMoin because of the better performance and API Design.
  • 159 documents (788KB), intranet, 6 users – everything works fine.
  • 312 documents (802KB data/ and 2079KB media/), intranet, 5 users – runs smoothly.
    The search for one of the most often found words is executed in less than 1sec, generating a 141KB HTML page, while running an Anti-Virus program and other software (since Apache2 on Windows runs on a workstation instead of a dedicated server). Still, this is not a very large WikiSite.
  • 173 documents (1.2MB), here at splitbrain.org.
  • 201 documents (1.5MB data/ 9MB media/), http://www.maisenbachers.de/dokuw. Works like a breeze, nothing to complain about, no performance issues noticed so far ( ~9 GB Wiki-traffic in Oct.04)
  • 973 documents (7.3MB), intranet with 12 users – works fine and no performance problems
  • 7,733 documents (23.4MB), on single user system, searches > 5 minutes. On hosted server removed search facility and replaced with another. Even excluding 6632 documents (18.6MB) having huge problems with Web spiders such as Google, effectively creating a denial of service.
  • 94 documents, (190k), works fine but search allocates more than the default 8MB of space allowed for PHP scripts so needed to tweak php.ini.
  • One of my files was 253kb, and failed to display, presumably the parser timed out while reading the raw text file. No issues with the general data tree, which spans several meg over a half-dozen namespaces.
  • 780 pages (3.6MB), working just fine. Search code in newer versions a big improvement!
  • 295 pages (1.25MB) + 1.8MB media data, works fast and reliable. Very heavy usage of ACLs, users/groups and user private groups. The website is used as a normal website (for external users) and as an collab tool for different user groups and purposes. It grows very fast (+20 pages a day) and I am very about the usage of DokuWiki.
  • 650 pages (5.9 MB) Shared host; serves approx. 6000 pages a day on DokuWiki. The current version 20050922 works fine, with feed caching and better search engine. The older version had some performance issues on those two points.
  • 1477 pages (13.2 MB), intranet with 1-2 users (being tested for scalability) - Dedicated dual-processor server w/5GB RAM and eAccelerator. Searches are slow, but almost manageable. Viewing namespace indexes is almost impossible, and I get frequent PHP timeouts even after increasing the timeout value to 60 seconds.
  • 1496 pages (11.6 MB), increasing (slowly) at the rate of about 10 per day. 80 users, spread across three continents. Shared dual-processor server (Microsoft IIS). Search is problematic across namespaces, but otherwise very few problems.
  • 325 pages (14.5 MB), 50 users, on a dual-proccessor 2.8 mhz opteron, with 1 GB RAM, Opensuse, PHP 5, no problem, work fine.
  • Wiki: 5588 pages (31 MB wiki source) + 1712 images (196 MB media data). Hardware: 1.4GHz AMD Athlon + 256 MB RAM. DokuWiki version: 2005-05-07. Suggested search mode: download the GZIPped wiki source files (*.txt) made by cron, unzip, then use grep or other desktop software :).
  • 1500 or so pages (10.66 MB), shared (hosted) Linux server, PHP 5.1.4, latest DokuWiki. NO problems; search is good.
  • 2000 pages and 2.5gb media over 6 DokuWiki installs on a single server - RHEL 4, twin 32-bit intel xeon CPUS, 8gb RAM (server also runs various static web content), search disabled, caching enabled. Very fast and responsive.
  • 18448 pages in 131M of text and 22gb of media across two servers. Both apache2 on rhel 5, shared servers with 30Gigs of ram, and 64bit ~12 cpus. Data lives on nfs mounted filer. It runs a bit slow, the search is very very slow and pretty much unusable. We have a ton of stuff writing .txt files, would love to have better search.
  • We have about 328,467 pages on our site. Total size of pages is about 2.2GB. Linode VM (4G RAM) running CentOS 7, Nginx, php-fpm. Wiki is configured for every language in the world (~7,000 namespaces). Main slowness is with generating report pages (for the tag plugin for instance). Every save in our wiki automatically does a git commit and a git push to Github repos, where other nodes can pull the changes down via git hooks. Currently on Binki.
  • 25107 pages (139M), a few images (hardly worth talking about), shared hosting (Ubuntu 14.04) with PHP 7.0.10. The Wiki (Release 2016-06-26a “Elenor of Tsort”) runs really fast without any problems. Search is also pretty good and even the loading of an index page with about 10000 subpages takes just about 2 seconds. There are some pages with a lot of backlinks (used as categories) which the Backlinks2 Plugin is not able to handle any more, so I had to remove it. But do=backlink is still fast even on those pages with a ton of backlinks. As I don't like plugins that much (as Backlinks2 shows, they often don't work good enough on big wikis), built-in categories with page breaks would be a really nice feature, because scrolling down a backlink site or namespace with hundreds of links or subpages is not the biggest fun. DokuWiki would be way more suitable for big wikis if a feature like that got implemented. Nevertheless, DokuWiki is still an amazing piece of software and by far the best wiki engine that comes without a database.

Discussion

  • Are there any hints, how to improve DokuWiki during runtime?
  • Does somebody has an idea which elements of DokuWiki are limiting the above experience at the moment?
    • The search is always going to be slower than SQL-based Wiki software, because DokuWiki's search has to parse a bunch of text files rather than simply performing a SQL query.
      • A plugin giving Dokuwiki the ability to use SQL for the search engine would be nice…
  • I would think you could build something on top of Lucene that could be installed alongside DokuWiki and do a good job on searching. You could also try something like Swish-e, as described in How to Index Anything. Work has been done on integrating this with PHP, which gives you a starting point. PHP Search Engine Showdown is also worth a read.
  • Just an idea: DokuWiki pages are in server files; there are good search engines for files (e.g. Google Desktop). Anybody taking a test drive? You'll have to extract data from Google Desktop caches on the server. Strigi? as an option?
  • It allows helps to add additional caching. We use the Smartcache plugin to make better use of the browser cache.
devel/scalability.txt · Last modified: 2016-09-26 13:42 by 2003:cb:5bc2:1900:5cf2:bb9:b36a:c6e3