Analyzers

Analyzers #

Analyzers are composed of a single Tokenizer and zero or more TokenFilters. The tokenizer may be preceded by one or more CharFilters. The analysis module allows one to register TokenFilters, Tokenizers and Analyzers under logical names that can then be referenced either in mapping definitions or in certain APIs. The Analysis module automatically registers (if not explicitly defined) built in analyzers, token filters, and tokenizers.

Test analyzer (e.g. Standard Analyzer) GET _analyze?analyzer=standard

Standard Analyzer #

standard

  • Standard Tokenizer
  • Standard Token Filter
  • Lower Case Token Filter
  • Stop Token Filter

Settings:

  • stopwords
  • max_token_length

Simple Analyzer #

simple

  • Lower Case Tokenizer

Whitespace Analyzer #

whitespace

  • Whitespace Tokenizer

Stop Analyzer #

stop

  • Lower Case Tokenizer
  • Stop Token Filter

Settings:

  • stopwords (use stopwords: _none_ to explictly specify an empty stopwords list (defaults to the english stop words)
  • stopwords_path

Keyword Analyzer #

keyword

Treats entire stream as one token.

Pattern Analyzer #

pattern

Separates flexibly text into terms via a regular expression.

Settings:

  • lowercase
  • pattern
  • flags
  • stopwords

Language Analyzers #

Set of language specific analyzers. For details see here.

Snowball Analyzer #

snowball

  • Standard tokenizer
  • Standard filter
  • Lowercase filter
  • Stop filter
  • Snowball filter

Custom Analyzer #

A custom analyzer. For details see the dedicated article.