It's better when it's simple

User Tools

Site Tools



Page tokenizing (aka. splitting the text into separate words)
handle Asian characters as words

This event is signalled by tokenizer() in inc/indexer.php when a page or search term is about to be split into words, handlers can use it to modify the behaviour how words are detected. The default action uses a regular expression to separate Asian characters into single words.

If you intercept this event you should also add your plugin to the index version through using the INDEXER_VERSION_GET event.

Passed Data

$data contains a string before being split into words. The source of the string will be the text of a page, or an individual term of a search query. Your plugin should modify the text in a way that words are separated by spaces or newlines.

See also

devel/event/indexer_text_prepare.txt · Last modified: 2018-12-08 15:41 by torpedo

Except where otherwise noted, content on this wiki is licensed under the following license: CC Attribution-Share Alike 4.0 International
CC Attribution-Share Alike 4.0 International Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki