DokuWiki

It's better when it's simple

User Tools

Site Tools


devel:scalability

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
devel:scalability [2016-09-26 13:40] – [Experiences] 2003:cb:5bc2:1900:5cf2:bb9:b36a:c6e3devel:scalability [2023-03-11 13:17] (current) – [Pages] Added link which gives guidance on the number of files which can be stored per directory on various file systems. 178.38.198.28
Line 1: Line 1:
-====== Scalability ====== +====== Scalability and Performance ======
-NEEDSATTENTION +
-** NOT EVEN A DRAFT YET **+
  
-This page is for notes and thoughts on the scalability of DokuWiki with particular reference to dispelling assumptions that a DBMS is the ideal repository for wiki information. That isn't to say a DBMS is the wrong choice, just to say that its not the only choice and that using the file system in the way that DokuWiki does is in no way a bad choice for a wiki.+This page tries to collect some information about how DokuWiki scales when it becomes "big" and what the limiting factors are.
  
-Reminder: characteristics of wiki data: +95% of everyone considering using DokuWiki will not even need to bother reading this page. Unless you want to run a public wiki for a very popular topic or want to use some very underpowered hardware performance should not be a problem for you.
-  * read often, updated less often +
-    * sounds like LDAP entry per page, attribudes for source text, meta... +
-  * mixed and unstructured content+
  
-A file system is an extremely efficient place for storing dataThat's where a DBMS is likely to put its dataFor simple retrieval a file system should beat the pants off a DBMSIt's quite likely a well constructed DBMS application will cache the results of its common queries in the file system for faster retrieval!+As of June 2020, the largest known Dokuwiki installation is over 743,000 pages and there are 25 known wikis with over 100,000 pages((https://forum.dokuwiki.org/d/17250-largest-dokuwikis/)).
  
-Much of a DokuWiki's work is simple retrieval - get a file and show its contents possibly with some processing. This is very well suited to a file system. Databases excel when dealing with highly structured data and also where small pieces of that data are updated frequently. Wiki's aren't like that.  +===== Limiting Factors =====
  
-Searching a wiki or more properly, indexing a wiki to enable quick searching is an area where DBMS technology can assist a wiki. DokuWiki implements its own indexing by creating a list of every "word" of three or more letters in the wiki. Partial searching is slower than it would be with a DB and it is more difficult to handle "advanced searches", however for whole word searching it is a viable solution (i.e. the results are presented in a reasonable time when compared to other activities on the wiki). For a public wiki its arguable that Google provides a better indexing service than any internal search could.+==== Pages ====
  
-A DBMS can make it easier to cross the single server threshold. At its simplest, the DBMS is placed on one box and the web server / application component on anotherIt's arguable that DokuWiki could accomplish something similar by using NFS mounts to allow multiple locally remote (i.e. LAN connected) servers to access the wiki data directory tree. I am not aware of any DokuWiki installation that has implemented anything similar.+The number of pages is not limitedSome people have hundreds of thousands pages in their wiki and no problems at all, but it all "depends".
  
-For very large installations where the data will have to reside on multiple servers, instead of solving replication/synchronisation issues the application can hand them off to suitable DBMS.+Your Filesystem will have limits. Commonly there's a limit on how many files can be in a single directory. DokuWiki uses directories for namespaces. So if you namespace your pages you can mitigate the problem. Keep in mind that each revision of a page creates its own file in directory.
  
-====== Experiences ======+The question how many files can be stored on different file systems was answered here: https://stackoverflow.com/questions/466521/how-many-files-can-i-put-in-a-directory
  
-I'd like to get some experiences with very large installationsMaybe from people who converted existing documents to [[:DokuWiki]] Please indicate the DokuWiki version you're using, as DokuWiki continually strives to improve its performance.+Of course you are limited by disk spaceOf course media files, pages and older revisions will take up spaceHowever the data/cache directory can grow quite big over the years - depending on the wiki size a cache directory of a few Gigabytes is not uncommon. By definition everything in the cache directory is optional so you can delete it's contents on the penalty of a small speed reduction until the cache is refilled.
  
-What does ''find ./data -name %%'*.txt'%% | wc -l'' say? +=== Inodes === 
 +Unix/Linux servers may also have a limit on the number of inodes, which are metadata for each file on the server. Running out of available inodes can interrupt the normal functioning of your server. To reduce the number of inodes used, you can reduce the amount of files used (for example, it might make sense to remove attic files that are over X years old depending on the use case).
  
-**Note** To give People some idea how to rate your experiences, pure Size is not enough. Please post some more information about your environment: Which Version of DokuWiki are your using, what kind of Hardware is DokuWiki running on, is it a shared or dedicated server?+==== Mediafiles ====
  
-  * ./data/*txt = 986 items, 6.6MB and ./media/* = 853, 227MB. (Update, 17/04/08: > Now up to 1893 pages and still going strong) Intranet installation of 55 users across three locations. Running on FreeBSD. Lovely system! Our users are continually discovering new ways to put it to work. At this size, it would be great to have a more granular search system (i.e., search whole wiki, or just a particular namespace). I love that DokuWiki is not database driven - it's great to be a able to manipulate the page content using standard UNIX tools. -- //Nick Fahey, 23 Aug 2007// +Basically everything said for pages is also true for media files.
-    * Read [[:search]] - namespace search is possible  --- //[[andi@splitbrain.org|Andreas Gohr]] 2007-08-23 11:08// +
-  * <del>187 documents (1.3M) on the [[http://www.ubuntuusers.de/wiki|UbuntuUsers Wiki]] -- no problems.</del> -- MANY pages. We switched to MoinMoin because of the better performance and API Design. +
-  * 159 documents (788KB), intranet, 6 users -- everything works fine. +
-  * 312 documents (802KB data/ and 2079KB media/), intranet, 5 users -- runs smoothly.\\ The search for one of the most often found words is executed in less than 1sec, generating a 141KB HTML page, while running an Anti-Virus program and other software (since Apache2 on Windows runs on a workstation instead of a dedicated server). Still, this is not a very large WikiSite. +
-  * 173 documents (1.2MB), here at splitbrain.org. +
-  * 201 documents (1.5MB data/ 9MB media/), http://www.maisenbachers.de/dokuw. Works like a breeze, nothing to complain about, no performance issues noticed so far ( ~9 GB Wiki-traffic in Oct.04) +
-  * 973 documents (7.3MB), intranet with 12 users -- works fine and no performance problems +
-  * 7,733 documents (23.4MB), on single user system, searches > 5 minutes. On hosted server removed search facility and replaced with another. Even excluding 6632 documents (18.6MB) having huge problems with Web spiders such as Google, effectively creating a denial of service. +
-  * 94 documents, (190k), works fine but search allocates more than the default 8MB of space allowed for PHP scripts so needed to tweak php.ini. +
-  * One of my files was 253kb, and failed to display, presumably the parser timed out while reading the raw text file. No issues with the general data tree, which spans several meg over a half-dozen namespaces. +
-  * 780 pages (3.6MB), working just fine. Search code in newer versions a big improvement! +
-  * 295 pages (1.25MB) + 1.8MB media data, works fast and reliable. Very heavy usage of ACLs, users/groups and user private groups. The website is used as a normal website (for external users) and as an collab tool for different user groups and purposes. It grows very fast (+20 pages a day) and I am very about the usage of DokuWiki. +
-  * 650 pages (5.9 MB) Shared host; serves approx. 6000 pages a day on DokuWiki. The current version 20050922 works fine, with feed caching and better search engine. The older version had some performance issues on those two points. +
-  * 1477 pages (13.2 MB), intranet with 1-2 users (being tested for scalability) - Dedicated dual-processor server w/5GB RAM and eAccelerator. Searches are slow, but almost manageable. Viewing namespace indexes is almost impossible, and I get frequent PHP timeouts even after increasing the timeout value to 60 seconds. +
-  * 1496 pages (11.6 MB), increasing (slowly) at the rate of about 10 per day. 80 users, spread across three continents. Shared dual-processor server (Microsoft IIS). Search is problematic across namespaces, but otherwise very few problems.  +
-  * 325 pages (14.5 MB), 50 users, on a dual-proccessor 2.8 mhz opteron, with 1 GB RAM, Opensuse, PHP 5, no problem, work fine. +
-  * Wiki: 5588 pages (31 MB wiki source) + 1712 images (196 MB media data). Hardware: 1.4GHz AMD Athlon + 256 MB RAM. DokuWiki version: 2005-05-07. Suggested search mode: download the GZIPped wiki source files (*.txt) made by cron, unzip, then use grep or other desktop software :). +
-  * 1500 or so pages (10.66 MB), shared (hosted) Linux server, PHP 5.1.4, latest DokuWiki. NO problems; search is good. +
-  * 2000 pages and 2.5gb media over 6 DokuWiki installs on a single server - RHEL 4, twin 32-bit intel xeon CPUS, 8gb RAM (server also runs various static web content), search disabled, caching enabled. Very fast and responsive. +
-  * 18448 pages in 131M of text and 22gb of media across two servers. Both apache2 on rhel 5, shared servers with 30Gigs of ram, and 64bit ~12 cpus. Data lives on nfs mounted filer. It runs a bit slow, the search is very very slow and pretty much unusable. We have a ton of stuff writing .txt files, would love to have better search. +
-  * We have about 328,467 pages on our site.  Total size of pages is about 2.2GB.  Linode VM (4G RAM) running CentOS 7, Nginx, php-fpm.  Wiki is configured for every language in the world (~7,000 namespaces).  Main slowness is with generating report pages (for the tag plugin for instance).  Every save in our wiki automatically does a git commit and a git push to Github repos, where other nodes can pull the changes down via git hooks.  Currently on Binki. +
-  * 25107 pages (139M), a few images (hardly worth talking about), shared hosting (Ubuntu 14.04) with PHP 7.0.10. The Wiki (Release 2016-06-26a "Elenor of Tsort") runs really fast without any problems. Search is also pretty good and even the loading of an index page with about 10000 subpages takes just about 2 seconds. There are some pages with **a lot** of backlinks (used as categories) which the [[plugin:backlinks2|Backlinks2 Plugin]] is not able to handle any more, so I had to remove it. But ''do=backlink'' is still fast even on those pages with a ton of backlinks. As I don't like plugins that much (as Backlinks2 shows, they often don't work good enough on big wikis), built-in categories with page breaks would be a really nice feature, because scrolling down a backlink site or namespace with hundreds of links or subpages is not the biggest fun. DokuWiki would be way more suitable for big wikis if a feature like that got implemented. Nevertheless, DokuWiki is still an amazing peace of software and by far the best wiki engine that comes without a database. +
-===== Links ===== +
-  * Maybe also interesting: [[wpmeta>DokuWiki_vs_MediaWiki_benchmarks|DokuWiki vs MediaWiki benchmarks]]+
  
 +Since the media manager does not do paging (currently), it is recommended to make use of namespaces to organize the mediafiles. Loading hundreds of thumbnails is going to be slow.
  
-===== Discussion ===== +Mediafiles are resized through PHP's libGD mechanism by default. A more efficient way may be to use ImageMagick via the [[config:im_convert]] option. 
-  * Are there any hints, how to improve DokuWiki during runtime?  + 
-  * Does somebody has an idea which elements of DokuWiki are limiting the above experience at the moment? + 
-    The search is always going to be slower than SQL-based Wiki softwarebecause DokuWiki'search has to parse bunch of text files rather than simply performing SQL query+==== Search ==== 
-      * A plugin giving Dokuwiki the ability to use SQL for the search engine would be nice... + 
-  would think you could build something on top of Lucene that could be installed alongside DokuWiki and do good job on searchingYou could also try something like [[http://swish-e.org/|Swish-e]], as described in [[http://www.linuxjournal.com/article/6652|How to Index Anything]]Work has been done on integrating this with PHP, which gives you a starting point[[http://www.onlamp.com/pub/a/php/2006/02/16/search-engine-showdown.html?page=last|PHP Search Engine Showdown]] is also worth read+In the early days of DokuWiki, search used to be the limiting factor. Back in the day, a search would simply go through all available pages and look for the searched term - the more pages you had, the slower that was. 
-  * Just an idea: DokuWiki pages are in server files; there are good search engines for files (e.g. Google Desktop). Anybody taking test driveYou'll have to extract data from Google Desktop caches on the server. [[http://strigi.sourceforge.net/|Strigi]]? as an option+ 
-  It allows helps to add additional caching. We use the [[http://www.dokuwiki.org/plugin:smartcache|Smartcache]] plugin to make better use of the browser cache. +Today, DokuWiki uses an **index based search**. That search index makes searching much faster since a term is simply looked up in the index and the results are immediately available. The index is word based, sorted by word lengths. So a search for full word is faster than a search for a word part (using the ''*'' syntax)
 + 
 +In theory, the limiting factor for the index based search is the memory_limit in your PHP setup. Part of the index is a list of all pages in the wiki - this list needs to be loaded completely into RAMIn reality I never heard anyone having this problem
 + 
 +==== Disk I/O ==== 
 + 
 +DokuWiki relies heavily on files for storing everything. The most significant speed update you can achieve is using an SSD for your setup. 
 + 
 +Modern Operating Systems will cache disk access to frequently used files in memory. The more RAM your server has, the more is available to be used as file system cache. 
 + 
 +DokuWiki does lot of scanning for different files that may not exist. For example we check each plugin directory for the existence of a script.js file. Those non-hits may add up to some significant amounts on very busy Setups. 
 + 
 +We try to avoid whole directory scans as much as possible, but sometimes it's necessary. The [[config:readdircache]] option may help mitigate the problem somewhat. 
 + 
 +The reliance on the filesystem however makes it extremely difficult to run a DokuWiki on multiple serversFor performance reasons, it is not recommended to use a network filesystem. 
 + 
 +Disabling access time recording in your filesystem may be a good idea to decrease IO load. 
 + 
 +==== PHP Version ==== 
 + 
 +Each and every new PHP version gets better and faster. There is a huge difference between PHP 5 and 7and some smaller differences between the different minor versions of 7. Always use the best version available to you
 + 
 +==== Webserver ==== 
 + 
 +The Webserver needs to pass requests to PHPWe recommend using mod_php or FPM for thatFastCGI is slowerUnfortunately FastCGI seems to be the only option when using IIS, so if you can use something else do so. 
 + 
 + 
 +===== Mitigations ===== 
 + 
 +The above already gives a few hints on where you may experience bottlenecks. Here is a list of further reading on how to mitigate problems when they occur. 
 + 
 + 
 +  * use better hardware - sometimes simply upgrading to a better server (more RAM, SSDis the simplest solution 
 +  * use dedicated search engine 
 +    * [[plugin:sphinxsearch]] 
 +    * [[plugin:googlesearch]] 
 +    * [[plugin:solr]] 
 +  * improve end user caching 
 +    * [[plugin:smartcache]] 
 + 
 +===== See also ===== 
 + 
 +  * [[faq:database|FAQ: How about using a Database?]] 
 +  * [[wpmeta>DokuWiki_vs_MediaWiki_benchmarks|DokuWiki vs MediaWiki benchmarks]] 
 + 
 +===== Experiences ===== 
 + 
 +We'd like to collect some real world number of "big" installations here. If you are running a larger installation, please let us know about your setup here. Especially if you ran into scaling problems and solved them. 
 + 
 +> **dokuwiki.org** has currently 4859 pages with about 15k pageviews per day on average 
 +> still on PHP 5.6, Linux/ext4 with noatime, Apache + FPM 
 +> running on a i7-6700 CPU with 64GB of RAM and SSD. That server is running all kinds of other DokuWiki services besides the wiki itselfEverything's fine and snappy. 
 +>  --- [[user>andi|Andreas Gohr]] //2019-02-28 12:23// 
 + 
 +>> I am also curious if there are other "big" DokuWiki sites in the world, or perhaps DokuWiki.org itself may be a candidate of the trophyBut it seems hard to find evidence or real statistics isn't it
 +>> **[Off-topic]** //I myself is also wondering how such big DokuWiki sites are organized and run./
 +>> --- [[user>MilchFlasche|MilchFlasche]] //2019-04-02 10:58//
devel/scalability.1474890044.txt.gz · Last modified: 2016-09-26 13:40 by 2003:cb:5bc2:1900:5cf2:bb9:b36a:c6e3

Except where otherwise noted, content on this wiki is licensed under the following license: CC Attribution-Share Alike 4.0 International
CC Attribution-Share Alike 4.0 International Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki