Analyzers #
Analyzers are composed of a single Tokenizer and zero or more TokenFilters. The tokenizer may be preceded by one or more CharFilters. The analysis module allows one to register TokenFilters, Tokenizers and Analyzers under logical names that can then be referenced either in mapping definitions or in certain APIs. The Analysis module automatically registers (if not explicitly defined) built in analyzers, token filters, and tokenizers.
Test analyzer (e.g. Standard Analyzer)
GET _analyze?analyzer=standard
Standard Analyzer #
standard
- Standard Tokenizer
- Standard Token Filter
- Lower Case Token Filter
- Stop Token Filter
Settings:
stopwords
max_token_length
Simple Analyzer #
simple
- Lower Case Tokenizer
Whitespace Analyzer #
whitespace
- Whitespace Tokenizer
Stop Analyzer #
stop
- Lower Case Tokenizer
- Stop Token Filter
Settings:
stopwords
(usestopwords: _none_
to explictly specify an empty stopwords list (defaults to the english stop words)stopwords_path
Keyword Analyzer #
keyword
Treats entire stream as one token.
Pattern Analyzer #
pattern
Separates flexibly text into terms via a regular expression.
Settings:
lowercase
pattern
flags
stopwords
Language Analyzers #
Set of language specific analyzers. For details see here.
Snowball Analyzer #
snowball
- Standard tokenizer
- Standard filter
- Lowercase filter
- Stop filter
- Snowball filter