Elasticsearch

Elasticsearch #

Installation #

curl -X GET http://localhost:9200/

Konfiguration #

Hinweise:

Hinweise für 2.x: Damit sich die auf verschiedenen Hosts befindlichen Instanzen finden können, muss network.host identisch sein zu den entsprechenden Einträgen in discovery.zen.ping.unicast.hosts auf den anderen Instanzen. Zu beachten ist, dass die Einstellungen in Kibana analog angepasst werden müssen (also nicht mehr http://localhost:8080, sondern explizit mit http://sb-sXY.swissbib.unibas.ch:8080)

  1. auch https://www.elastic.co/guide/en/elasticsearch/reference/2.x/modules-network.html

Betrieb #

Start: ./bin/elasticsearch (Parameter -d für Daemon-Modus) Stop: curl -XPOST 'http://localhost:9200/_shutdown'

Strukturvergleich mit relationaler Datenbank #

Relational DB ⇒ Databases ⇒ Tables ⇒ Rows ⇒ Columns Elasticsearch ⇒ Indices ⇒ Types ⇒ Documents ⇒ Fields

Kommunikation über RESTful API #

curl -X<VERB> '<PROTOCOL>://<HOST>/<PATH>?<QUERY_STRING>' -d '<BODY>'

VERB: The appropriate HTTP method or verb: GET, POST, PUT, HEAD, or DELETE. PROTOCOL: Either http or https (if you have an https proxy in front of Elasticsearch.) HOST: The hostname of any node in your Elasticsearch cluster, or localhost for a node on your local machine. PORT: The port running the Elasticsearch HTTP service, which defaults to 9200. QUERY_STRING: Any optional query-string parameters (for example ?pretty will pretty-print the JSON response to make it easier to read.) BODY: A JSON-encoded request body (if the request needs one.)

Indexierung einer Datei #

curl -X<VERB> '<PROTOCOL>://<HOST>/<PATH>' -d @<PATHTOFILE>

Analyzer #

An analyzer is really just a wrapper that combines three functions into a single package:

Character filters First, the string is passed through any character filters in turn. Their job is to tidy up the string before tokenization. A character filter could be used to strip out HTML, or to convert & characters to the word and.

Tokenizer Next, the string is tokenized into individual terms by a tokenizer. A simple tokenizer might split the text into terms whenever it encounters whitespace or punctuation.

Token filters Last, each term is passed through any token filters in turn, which can change terms (for example, lowercasing Quick), remove terms (for example, stopwords such as a, and, the) or add terms (for example, synonyms like jump and leap).

Specifying Analyzers #

When Elasticsearch detects a new string field in your documents, it automatically configures it as a full-text string field and analyzes it with the standard analyzer. You don’t always want this. Perhaps you want to apply a different analyzer that suits the language your data is in. And sometimes you want a string field to be just a string field—to index the exact value that you pass in, without any analysis, such as a string user ID or an internal status field or tag. To achieve this, we have to configure these fields manually by specifying the mapping.

Testanalyse #

=> http://www.elastic.co/guide/en/elasticsearch/reference/current/indices-analyze.html

Benutzung eines eingebauten Analysierers curl -XGET 'localhost:9200/_analyze?analyzer=standard' -d 'this is a test'

Angepasster Analysierer curl -XGET 'localhost:9200/_analyze?tokenizer=keyword&token_filters=lowercase&char_filters=html_strip' -d 'this is a <b>test</b>'

Testanalyse auf Grundlage eines

APIs #

Weitere Informationen #