Token Filters

Token Filters #

Token filters accept a stream of tokens from a tokenizer and can modify tokens (eg lowercasing), delete tokens (eg remove stopwords) or add tokens (eg synonyms).

Elasticsearch has a number of built in token filters which can be used to build custom analyzers.

Standard Token Filter #

standard

Currently does nothing.

ASCII Folding Token Filter #

asciifolding

Converts alphabetic, numeric, and symbolic Unicode characters which are not in the first 127 ASCII characters (the “Basic Latin” Unicode block) into their ASCII equivalents, if one exists.

Length Token Filter #

length

Lowercase Token Filter #

lowercase

Uppercase Token Filter #

uppercase

NGram Token Filter #

nGram

Edge NGram Token Filter #

edgeNGram

Porter Stem Token Filter #

porter_stem

Shingle Token Filter #

shingle

Stop Token Filter #

stop

Word Delimiter Token Filter #

word_delimiter

Stemmer Token Filter #

Stemmer Override Token Filter #

Keyword Marker Token Filter #

Keyword Repeat Token Filter #

KStem Token Filter #

Snowball Token Filter #

Phonetic Token Filter #

Synonym Token Filter #

Compound Word Token Filter #

Reverse Token Filter #

Elision Token Filter #

Truncate Token Filter #

Unique Token Filter #

Pattern Capture Token Filter #

Pattern Replace Token Filter #

Trim Token Filter #

Limit Token Count Token Filter #

Hunspell Token Filter #

Common Grams Token Filter #

Normalization Token Filter #

CJK Width Token Filter #

CJK Bigram Token Filter #

Delimited Payload Token Filter #

Keep Words Token Filter #

Keep Types Token Filter #

Classic Token Filter #

Apostrophe Token Filter #