Page tokenizing (aka. splitting the text into separate words)
handle Asian characters as words

This event is signalled by tokenizer() in inc/indexer.php when a page or search term is about to be split into words, handlers can use it to modify the behaviour how words are detected. The default action uses a regular expression to separate Asian characters into single words.

If you intercept this event you should also add your plugin to the index version through using the INDEXER_VERSION_GET event.

Passed Data

$data contains a string before being split into words. The source of the string will be the text of a page, or an individual term of a search query. Your plugin should modify the text in a way that words are separated by spaces or newlines.

