devel:scalability
Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
devel:scalability [2019-02-28 12:18] – andi | devel:scalability [2023-03-11 13:17] (current) – [Pages] Added link which gives guidance on the number of files which can be stored per directory on various file systems. 178.38.198.28 | ||
---|---|---|---|
Line 1: | Line 1: | ||
- | ====== Scalability ====== | + | ====== Scalability |
- | This page tries to collect some information about how DokuWiki scales when it becomes " | + | This page tries to collect some information about how DokuWiki scales when it becomes " |
+ | |||
+ | 95% of everyone considering using DokuWiki | ||
+ | |||
+ | As of June 2020, the largest known Dokuwiki installation is over 743,000 pages and there are 25 known wikis with over 100,000 pages((https:// | ||
===== Limiting Factors ===== | ===== Limiting Factors ===== | ||
- | ==== Number of Pages ==== | + | ==== Pages ==== |
- | The number of pages is not limited. Some people have hundred | + | The number of pages is not limited. Some people have hundreds |
Your Filesystem will have limits. Commonly there' | Your Filesystem will have limits. Commonly there' | ||
- | FIXME add info about file/ | + | The question how many files can be stored on different |
+ | |||
+ | Of course you are limited by disk space. Of course media files, pages and older revisions will take up space. However the data/cache directory can grow quite big over the years - depending on the wiki size a cache directory of a few Gigabytes is not uncommon. By definition everything in the cache directory is optional so you can delete it's contents on the penalty of a small speed reduction until the cache is refilled. | ||
+ | |||
+ | === Inodes === | ||
+ | Unix/Linux servers may also have a limit on the number of inodes, which are metadata for each file on the server. Running out of available inodes can interrupt the normal functioning of your server. To reduce the number of inodes used, you can reduce the amount of files used (for example, it might make sense to remove attic files that are over X years old depending on the use case). | ||
+ | |||
+ | ==== Mediafiles ==== | ||
+ | |||
+ | Basically everything said for pages is also true for media files. | ||
+ | |||
+ | Since the media manager does not do paging (currently), | ||
+ | |||
+ | Mediafiles are resized through PHP's libGD mechanism by default. A more efficient way may be to use ImageMagick via the [[config: | ||
Line 22: | Line 39: | ||
In theory, the limiting factor for the index based search is the memory_limit in your PHP setup. Part of the index is a list of all pages in the wiki - this list needs to be loaded completely into RAM. In reality I never heard anyone having this problem. | In theory, the limiting factor for the index based search is the memory_limit in your PHP setup. Part of the index is a list of all pages in the wiki - this list needs to be loaded completely into RAM. In reality I never heard anyone having this problem. | ||
+ | ==== Disk I/O ==== | ||
+ | DokuWiki relies heavily on files for storing everything. The most significant speed update you can achieve is using an SSD for your setup. | ||
+ | Modern Operating Systems will cache disk access to frequently used files in memory. The more RAM your server has, the more is available to be used as file system cache. | ||
+ | DokuWiki does a lot of scanning for different files that may not exist. For example we check each plugin directory for the existence of a script.js file. Those non-hits may add up to some significant amounts on very busy Setups. | ||
- | ====== old stuff ====== | + | We try to avoid whole directory scans as much as possible, but sometimes it's necessary. The [[config: |
- | + | ||
- | I'm in the process of rewriting this page... the content below will be merged soonish | + | |
- | NEEDSATTENTION | + | The reliance on the filesystem however makes it extremely difficult to run a DokuWiki on multiple servers. For performance reasons, it is not recommended to use a network filesystem. |
- | ** NOT EVEN A DRAFT YET ** | + | |
- | This page is for notes and thoughts on the scalability of DokuWiki with particular reference to dispelling assumptions that a DBMS is the ideal repository for wiki information. That isn' | + | Disabling access time recording in your filesystem may be a good idea to decrease IO load. |
- | Reminder: characteristics of wiki data: | + | ==== PHP Version ==== |
- | * read often, updated less often | + | |
- | * sounds like LDAP entry per page, attribudes for source text, meta... | + | |
- | * mixed and unstructured content | + | |
- | A file system is an extremely efficient place for storing data. That's where a DBMS is likely to put its data. For simple retrieval | + | Each and every new PHP version gets better and faster. There is a huge difference between PHP 5 and 7, and some smaller differences between |
- | Much of a DokuWiki' | + | ==== Webserver ==== |
- | Searching a wiki or more properly, indexing a wiki to enable quick searching is an area where DBMS technology can assist a wiki. DokuWiki implements its own indexing by creating a list of every " | + | The Webserver needs to pass requests |
- | A DBMS can make it easier to cross the single server threshold. At its simplest, the DBMS is placed on one box and the web server / application component on another. It's arguable that DokuWiki could accomplish something similar by using NFS mounts to allow multiple locally remote (i.e. LAN connected) servers to access the wiki data directory tree. I am not aware of any DokuWiki installation that has implemented anything similar. | ||
- | For very large installations where the data will have to reside on multiple servers, instead of solving replication/ | + | ===== Mitigations ===== |
- | ====== Experiences ====== | + | The above already gives a few hints on where you may experience bottlenecks. Here is a list of further reading on how to mitigate problems when they occur. |
- | I'd like to get some experiences with very large installations. Maybe from people who converted existing documents to [[: | ||
- | What does '' | + | |
+ | * use a dedicated search engine | ||
+ | * [[plugin: | ||
+ | * [[plugin: | ||
+ | * [[plugin: | ||
+ | * improve end user caching | ||
+ | * [[plugin: | ||
- | **Note** To give People some idea how to rate your experiences, | + | ===== See also ===== |
- | * ./data/*txt = 986 items, 6.6MB and ./media/* = 853, 227MB. (Update, 17/04/08: > Now up to 1893 pages and still going strong) Intranet installation of 55 users across three locations. Running on FreeBSD. Lovely system! Our users are continually discovering new ways to put it to work. At this size, it would be great to have a more granular search system (i.e., search whole wiki, or just a particular namespace). I love that DokuWiki is not database driven - it's great to be a able to manipulate the page content using standard UNIX tools. -- //Nick Fahey, 23 Aug 2007// | + | * [[faq:database|FAQ: How about using a Database?]] |
- | * Read [[:search]] - namespace search is possible | + | * [[wpmeta> |
- | * < | + | |
- | * 159 documents (788KB), intranet, 6 users -- everything works fine. | + | |
- | * 312 documents (802KB data/ and 2079KB media/), intranet, 5 users -- runs smoothly.\\ The search for one of the most often found words is executed in less than 1sec, generating a 141KB HTML page, while running an Anti-Virus program and other software (since Apache2 on Windows runs on a workstation instead of a dedicated server). Still, this is not a very large WikiSite. | + | |
- | * 173 documents (1.2MB), here at splitbrain.org. | + | |
- | * 201 documents (1.5MB data/ 9MB media/), http:// | + | |
- | * 973 documents (7.3MB), intranet with 12 users -- works fine and no performance problems | + | |
- | * 7,733 documents (23.4MB), on single user system, searches > 5 minutes. On hosted server removed search facility and replaced with another. Even excluding 6632 documents (18.6MB) having huge problems with Web spiders such as Google, effectively creating a denial of service. | + | |
- | * 94 documents, (190k), works fine but search allocates more than the default 8MB of space allowed for PHP scripts so needed to tweak php.ini. | + | |
- | * One of my files was 253kb, and failed to display, presumably the parser timed out while reading the raw text file. No issues with the general data tree, which spans several meg over a half-dozen namespaces. | + | |
- | * 780 pages (3.6MB), working just fine. Search code in newer versions a big improvement! | + | |
- | * 295 pages (1.25MB) + 1.8MB media data, works fast and reliable. Very heavy usage of ACLs, users/ | + | |
- | * 650 pages (5.9 MB) Shared host; serves approx. 6000 pages a day on DokuWiki. The current version 20050922 works fine, with feed caching and better search engine. The older version had some performance issues on those two points. | + | |
- | * 1477 pages (13.2 MB), intranet with 1-2 users (being tested for scalability) - Dedicated dual-processor server w/5GB RAM and eAccelerator. Searches are slow, but almost manageable. Viewing namespace indexes is almost impossible, and I get frequent PHP timeouts even after increasing the timeout value to 60 seconds. | + | |
- | * 1496 pages (11.6 MB), increasing (slowly) at the rate of about 10 per day. 80 users, spread across three continents. Shared dual-processor server (Microsoft IIS). Search is problematic across namespaces, but otherwise very few problems. | + | |
- | * 325 pages (14.5 MB), 50 users, on a dual-proccessor 2.8 mhz opteron, with 1 GB RAM, Opensuse, PHP 5, no problem, work fine. | + | |
- | * Wiki: 5588 pages (31 MB wiki source) + 1712 images (196 MB media data). Hardware: 1.4GHz AMD Athlon + 256 MB RAM. DokuWiki version: 2005-05-07. Suggested search mode: download the GZIPped wiki source files (*.txt) made by cron, unzip, then use grep or other desktop software :). | + | |
- | * 1500 or so pages (10.66 MB), shared (hosted) Linux server, PHP 5.1.4, latest DokuWiki. NO problems; search is good. | + | |
- | * 2000 pages and 2.5gb media over 6 DokuWiki installs on a single server - RHEL 4, twin 32-bit intel xeon CPUS, 8gb RAM (server also runs various static web content), search disabled, caching enabled. Very fast and responsive. | + | |
- | * 18448 pages in 131M of text and 22gb of media across two servers. Both apache2 on rhel 5, shared servers with 30Gigs of ram, and 64bit ~12 cpus. Data lives on nfs mounted filer. It runs a bit slow, the search is very very slow and pretty much unusable. We have a ton of stuff writing .txt files, would love to have better search. | + | |
- | * We have about 328,467 pages on our site. Total size of pages is about 2.2GB. | + | |
- | * 25107 pages (139M), a few images (hardly worth talking about), shared hosting (Ubuntu 14.04) with PHP 7.0.10. The Wiki (Release 2016-06-26a " | + | |
- | ==== See also ==== | + | ===== Experiences |
- | * [[https:// | + | |
- | * [[https:// | + | |
- | ===== Links ===== | + | |
- | * Maybe also interesting: | + | |
+ | We'd like to collect some real world number of " | ||
- | ===== Discussion ===== | + | > **dokuwiki.org** has currently 4859 pages with about 15k pageviews per day on average |
- | | + | > still on PHP 5.6, Linux/ext4 with noatime, Apache + FPM |
- | | + | > running on a i7-6700 CPU with 64GB of RAM and SSD. That server |
- | * The search is always going to be slower than SQL-based Wiki software, because DokuWiki' | + | > --- [[user> |
- | | + | |
- | | + | |
- | * Just an idea: DokuWiki | + | |
- | | + | |
- | * See also [[plugin:sphinxsearch]] | + | |
+ | >> I am also curious if there are other " | ||
+ | >> **[Off-topic]** //I myself is also wondering how such big DokuWiki sites are organized and run.// | ||
+ | >> --- [[user> |
devel/scalability.1551352707.txt.gz · Last modified: 2019-02-28 12:18 by andi