DokuWiki

It's better when it's simple

User Tools

Site Tools


search

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
search [2023-08-21 21:56] – [Some Background on the Searchindex] Klap-insearch [2023-11-29 05:23] (current) – [Some Background on the Searchindex] state that numbering starts at 0 in page.idx schplurtz
Line 64: Line 64:
 DokuWiki now uses an index to search even big wikis very fast, to be able to find anything the index needs to be filled with current data. Information about a page's content is added and updated when a page is viewed by a user. Each page includes an invisible image ([[wp>webbug]]) which calls the index update process if needed.((Note that the webbug is used for other tasks, too.  See http://forum.dokuwiki.org/post/3116))  (That is, if the timestamp of the page is newer than the timestamp of the index file.) DokuWiki now uses an index to search even big wikis very fast, to be able to find anything the index needs to be filled with current data. Information about a page's content is added and updated when a page is viewed by a user. Each page includes an invisible image ([[wp>webbug]]) which calls the index update process if needed.((Note that the webbug is used for other tasks, too.  See http://forum.dokuwiki.org/post/3116))  (That is, if the timestamp of the page is newer than the timestamp of the index file.)
  
-The index consists of files called ''page.idx'', ''w//[n]//.idx'' and ''i//[n]//.idx'' located in the index directory. ''w//[n]//.idx'' contains a list of all words (except stopwords) with a length of //n// that appear on the wiki pages. For every line in ''w//[n]//.idx'' there is a line in the corresponding ''i//[n]//.idx'' file that contains page references in the form of ''pn*freq''. ''pn'' is a line number for ''page.idx'', ''freq'' denotes how often the word appears on the page. Multiple page references are separated with a colon.+The index consists of files called ''page.idx'', ''w//[n]//.idx'' and ''i//[n]//.idx'' located in the index directory. ''w//[n]//.idx'' contains a list of all words (except stopwords) with a length of //n// that appear on the wiki pages. For every line in ''w//[n]//.idx'' there is a line in the corresponding ''i//[n]//.idx'' file that contains page references in the form of ''pn*freq''. ''pn'' is a line offset for ''page.idx'', ''freq'' denotes how often the word appears on the page. Multiple page references are separated with a colon.
  
 The [[taskrunner|indexer]] uses a language specific stopword file which contains a list of very common words which will never be indexed (e.g. the word ''the'' in English). Searching for such a word will not return any hits. This stopword file is located in language folder of the DokuWiki installation, that is ''<dokuwiki>/inc/lang/<language>/stopwords.txt'' so you can edit the file in proper folder for adding or removing words not indexed for that language (Unfortunately, these edits are not kept on upgrades of DokuWiki). The [[taskrunner|indexer]] uses a language specific stopword file which contains a list of very common words which will never be indexed (e.g. the word ''the'' in English). Searching for such a word will not return any hits. This stopword file is located in language folder of the DokuWiki installation, that is ''<dokuwiki>/inc/lang/<language>/stopwords.txt'' so you can edit the file in proper folder for adding or removing words not indexed for that language (Unfortunately, these edits are not kept on upgrades of DokuWiki).
search.txt · Last modified: 2023-11-29 05:23 by schplurtz

Except where otherwise noted, content on this wiki is licensed under the following license: CC Attribution-Share Alike 4.0 International
CC Attribution-Share Alike 4.0 International Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki