Learn about DokuWiki
Learn about DokuWiki
This is an old revision of the document!
Compatible with DokuWiki
This extension has not been updated in over 2 years. It may no longer be maintained or supported and may have compatibility issues.
This plugin provides a better search experience in Asian languages by manipulating a search query. No re-indexing is required.
DokuWiki has a full-text search function which now also supports Asian languages. However, searching Asian language texts still has some problems. The problems are that:
This plugin solves these problems by manipulating a search query, never making changes to your DokuWiki's search index files.
Let's assume that your DokuWiki has a page whose text is:
[ 東京 ]
|Plain Dokuwiki||京都から東海道新幹線で東に向かうと、東京に着いた。||Too fragmented. Noisy.|
|With this plugin||京都から東海道新幹線で東に向かうと、東京に着いた。||Good.|
[ 新幹線 東京 ]
|Plain DokuWiki||No Hits!||Why?|
|With this plugin||京都から東海道新幹線で東に向かうと、東京に着いた。||As I expected.|
Note that the space between
"東京" is not a normal space but an ideographic space (U+3000). In a plain DokuWiki, the search query
[ 新幹線 東京 ] is parsed as a monolithic
"新幹線 東京", not as separated
This plugin manipulates a search query by using the following steps:
Below is a example of complicated query.
You can see that Asian characters are quoted, ideographic spaces are replaced with normal spaces, and nothing is changed within preexistent phrases.
By default, DokuWiki treats each Asian character as a “word”, and additionally, each successive Asian character as a “phrase”. DokuWiki highlights both “words” and “phrases” in search results. Preprocessing a search query in above way reduces “words”, resulting in neat highlights of your search results.
By checking the returned values of
ft_queryParser, you can see how DokuWiki parsed these queries.
'phrases': - 'asiansearch plugin' - '検 索' - ' プラグイン 插件 플러그인' 'words': - 'dokuwiki' - 'プ' - 'ラ' - 'グ' - 'イ' - 'ン' - '插' - '件' - '플' - '러' - '그' - '인'
'phrases': - 'asiansearch plugin' - 'プラグイン' - '插件' - '플러그인' - '検 索' 'words': - 'dokuwiki'