Elasticsearch 101

7.2.2014 | 9 minutes of reading time

Introduction

Elasticsearch is a highly scalable search engine that stores data in a structure optimized for language based searches and it is a whole lot of fun to work with. In this 101 I’ll will give you a hands-on introduction to Elasticsearch and give you a glimpse at some of the key concepts.

open-source
distributed: clustering, replication, fail over, and master election out of the box
schema less: document based, automated type mapping, JSON
RESTful API
highly configurable
sane defaults
data exploration
based on Lucene
runs on the JVM
fast

Running Elasticsearch

Elasticsearch is a standalone Java application, so getting up and running is a piece of cake. Make sure you have Java ≥ 1.6.0 and that no one else is running Elasticsearch in your network. You can either download a packaged distribution from elasticsearch.org and unpack it or get the latest source from github .

Start a node in foreground with

$ bin/elasticsearch -f

You should see something like this

You can see in the output that Elasticsearch started, that it assigned the node a random name, and that the node started a cluster and elected itself as master. The node is publishing to HTTP port 9200 (default). We use this port to interact with the cluster. Now you can

$ curl localhost:9200

and you get some data about your node

Elasticsearch has a JSON based REST API. Administrative operations, indexing and searching, everything is done with HTTP and JSON. I use cURL for more concise examples. It may be more convenient for you to use any graphical HTTP client. There is also a number of browser extensions and plugins. If you use Google Chrome then I recommend Sense plugin for Chrome, a JSON aware developer console to Elasticsearch.

Lets start up another node and give it a name NODE_2 as parameter

$ bin/elasticsearch f -Des.node.name=NODE_2

You can see that NODE_2 started. The node detected the other node as master and joined the cluster. The new node publishes to HTTP port 9201. You can talk to 9201 as well, as each node behaves the same. Any node starting up will join this cluster if, and only if it shares the cluster name. In our case as we haven’t defined a cluster name a default setting is in place.
You now have a Elasticsearch cluster running with two nodes.

Install Head plugin

This is an optional step. The Head plugin or elasticsearch-head is a web front end for browsing and interacting with an Elasticsearch cluster. For more details on available plugins refer to the plugin guide .

To install it run

$ bin/plugin -install mobz/elasticsearch-head

Then open http://localhost:9200/_plugin/head/ in your browser

Indexing

A Search engine is something like a database with a difference in how data is stored. The structure is loosely similar to that of a conventional relational database like MySQL or Postgres for example.

Elasticsearch – Database
Index – Database
Type – Table
Document – Row
Field – Column

Elasticsearch creates an inverted index. http://en.wikipedia.org/wiki/Inverted_index

Everything is indexed in this data structure. It allows to quickly find all the documents that contain a particular word. Much in the way of an index at the back of a book.

Let’s index something. We send off a HTTP PUT with the URL made up of the index name, type name and ID and in the HTTP payload we supply a JSON document with the fields and values. Notice the field author has another JSON object nested.

Indexing in Elasticsearch corresponds to create and update in CRUD. If we try to index a document with an ID that already exists it is overwritten. Index and type are required while the ID is optional. If we do not specify an ID, then Elasticsearch will generate one.

And we get a response that verifies that the operation was successful

We get the name of the index, a type, and the ID. We also get a version, which is not a historical version. The versioning is used for optimistic concurrency control and is always incremented with any changes. The data we have supplied was all Elasticsearch needed. Elasticsearch automatically created the index for us.

Mapping

Elasticsearch is schema less. Elasticsearch is using mappings, which is basically a schema, but makes working with it much easier.

To see the mapping of our indexed blog

$ curl localhost:9200/documents/_mapping

Here you get the mapping for the index documents. There is the type blog and a list of all properties from our blog. Elasticsearch automatically predicts the data types. If Elasticsearch doesn’t predict the right type for your field, then you can supply a mapping when you create an index.

Getting data

To get a set of data, we set off a HTTP GET with the index name, type and the ID.

$ curl -XGET "http://localhost:9200/documents/blog/one"

The response looks like this

We get metadata information and the source field containing the JSON that we have indexed.

Searching

A document needs to be indexed before you can search for it. Elasticsearch refreshes every second by default.

Let’s search for our data set

$ curl -XGET "http://localhost:9200/documents/blog/_search?q=_id:one"

This is a query on the ID one. It is using the search facilities, but as we are looking for the ID, this can only result into one or zero documents.

Lucene under the hood

Elasticsearch is built on top of Lucene, a very old Java library that is proven, tested and best of its kind in open source search software. Everything related to indexing and searching text is implemented in Lucene. Elasticsearch builds an infrastructure around Lucene. While Lucene is a great tool, it can be cumbersome to use it directly and Lucene doesn’t provide any facilities to scale past a single node. Elasticsearch provides an easier more intuitive API and the infrastructure and operational tools for simple scalability across multiple nodes. The REST API also allows interoperability with non-Java languages.

A shard in Elasticsearch refers to a Lucene index. Elasticsearch by default uses five shards for each Elasticsearch index. A document is stored in one shard. Elasticsearch supports replica shards. One replica is configured by default.

Let’s put in another document

Now lets search for the term english

we get two hits

We can also search on a specific field. Nested fields are addressed with a point separator like in the next example.

We get a result that matches the author name Pip the Troll

In the last example we used a prefix query. For more types of queries refer to the elasticsearch.org guide . We are not providing the full value of the field. This is the gist of a search engine. We don’t need to know exactly what we are searching for. We can provide what we know and get results on what might be true or not in contrary to what must be true. We can find word stems, synonyms, misspellings, and we can even provide autocompletion.

Clustering

You can get some information about your cluster with

$ curl -XGET "http://localhost:9200/_cluster/health"

Alternatively, if you have installed the head plugin you can open http://localhost:9201/_plugin/head/ . We have a status, which is green. We have 5 active primary shards and 10 active shards, because we’re running with two nodes. Our indexed documents are available on each node.

Lets shut down our master. Go to your console of your master node and press CTRL-C then look at the console output of NODE_2.

You can see that NODE_2 noticed that the master has left the cluster and elected itself as new master. Check the status again

$ curl -XGET "http://localhost:9201/_cluster/health"

Make sure to use port 9201 of NODE_2 not 9200.

You can see that the cluster status is now yellow, because one of our two nodes is unassigned. The search functionality of the cluster is still working, though. If you set off our search requests from earlier again, but this time against port 9201 from NODE_2, you still get the search results, because everything we have indexed is available on every node.

If you start our previous master back up and check on the cluster status it’ll be back in status green.

Facets

At some point you will be interested in information about the data you have indexed. In our case we might be interested in what is the average number of words over all indexed blog articles. Elasticsearch has a feature called facets that provides aggregated statistics about a query. This is a core part of Elasticsearch and is part of the search API. Facets are always bound to a query and provide aggregate statistics alongside the query results. Facets are highly configurable and can return complex groupings of nested filters, spans of amounts or spans of time, even full Elasticsearch queries can be nested inside a Facet. In Elasticsearch 1.0 this feature will be called Aggregations and is supposed to have more features and be more composable.

Here we define a facet together with a match all query. The facet is a predefined statistical facet for number fields in this case our word_count field.

In the response we receive the query results first on top then the facet response with statistical numbers on our word_count field. There are further predefined facets you can choose from in order to get statistical information about your data and there is also Kibana that is a graphical front-end data analysis tool specifically tailored for Elasticsearch that became very popular.

That’s it!

No of course that’s not it. It’s just all I wanted to show you in this 101. I’ve introduced you to quiet a few topics but we barely scratched the surface of what there is to discover about Elasticsearch.

The query API has much more to offer than we have covered for instance. There are many interesting types of queries and filters that can be used. To get the most out of natural language based searches and other complex types of data, you’ll get in touch with analyzers. Analyzers are the tools to slice and dice words into stems to create an efficient search space for natural languages. The word stemming allows Elasticsearch to find linguistically similar words. Percolators are another very interesting topic. Percolators allow one to index queries and then send docs to Elasticsearch and find out which queries match the doc. So the entire operation turned around going in reverse direction. And there is even more.

I hope you found this post interesting and useful on your quest to discover this awesome peace of technology. Thank you for reading and stay in the loop for more posts to come on Elasticsearch. In the meanwhile I’ve put a list of links together, you can find them below, there is also a great interview with Github about Elasticsearch at scale.

Where to go from here

Visit the elasticsearch.org guide
You may follow the elasticsearch.org blog
Tutorials can be found at elasticsearch.org tutorials
Find Elasticsearch projects on github
Article about Elasticsearch Analyzers
Get Elasticsearch running on EC2
To use Elasticsearch with Java go to the elasticsearch.org Java API
Marvel is a visual cluster health and metrics dashboard
For graphical data analysis check out Kibana

To learn more about Lucene go to the Lucene documentation or visit the Lucene wiki .

Inverview with Github about Elasticsearch at scale

http://exploringelasticsearch.com/book/elasticsearch-at-scale-interviews/interview-with-the-github-elasticsearch-team.html

Was this post helpful?

Likes

Blog author

Dennis Probst

Do you still have questions? Just send me a message.

Your job at codecentric?

Jobs

Agile Developer und Consultant (w/d/m)

Alle Standorte

kibconfig – Wartungstool für Kibana Dashboards

Als wir vor 2 Jahren zu Beginn unseres Projekts damit begannen, unser ELK Logging über Kibana Dashboards zu optimieren, standen wir vor einem Problem: Wie konnten wir unsere für die PP-Umgebung vorbereiteten Dashboards, Visualisierungen und gespeicherten...

NoSQL
APM

12.10.2017 | 3 Minuten Lesezeit

Carsten Rohrbach

Graphen-Visualisierung mit Neo4j

In diesem Artikel möchte ich nach einer kurzen Einführung in die Graphen-Theorie einen Überblick über die NoSQL-Datenbank Neo4j geben. Insbesondere werde ich auf die Möglichkeiten eingehen, die Neo4j bei der Visualisierung von Graphen anbietet.Was ist...

Datenbank
NoSQL

18.6.2017 | 10 Minuten Lesezeit

Tobias Trelle

Elasticsearch: _type-Mapping zur Dateninspektion

ProblemsituationEine typische Situation: Daten aus einer Domän mit verschiedenen Sub-Domänen liegen in stark unterschiedlicher und unbekannter Form, mit ebenso unterschiedlichen und unbekannten Werten, vor. Sich mit diesen Daten auseinanderzusetzen ist...

NoSQL

5.12.2016 | 3 Minuten Lesezeit

Christian Börner-Schulte

Spring Boot & Apache CXF – Logging & Monitoring mit Logback, Elasticsearch...

SOAP-Endpoints auf Basis von Microservice-Technologien mit Spring Boot? Cool! Aber wie findet man bei den ganzen „Micro-Servern“ Fehler? Wie sehen die SOAP-Nachrichten aus und wie logge ich eigentlich generell? Und: wie viele Produkte haben wir eigentlich...

Frontend
NoSQL
Java
APM
Logging
Spring

26.7.2016 | 24 Minuten Lesezeit

Jonas Hecht

IoT-Analyse-Plattform

Internet of Things (IoT) oder auch Industrie 4.0 ist heute in aller Munde. Aber welche Herausforderungen stellen sich eigentlich bei der Verarbeitung großer Datenmengen? Eine Variante kann sein, Daten zu sammeln und später im Batch-Betrieb zu verarbeiten...

Cloud
IoT
NoSQL
Scala
Big Data

13.7.2016 | 14 Minuten Lesezeit

Achim Nierbeck

Elixir, Phoenix und CouchDB – Eine Einführung

Das Elixir MVC Framework PhoenixVon Markus Krogemann und Marcel WolfWorum geht es?Zunächst wird gezeigt, wie sich eine Webanwendung mit Phoenix in einfachen Schritten erstellen lässt, ohne dass ein tieferes Verständnis für eine funktionale Programmiersprache...

Softwareentwicklung
Functional programming
Frontend
NoSQL

13.1.2016 | 4 Minuten Lesezeit

Marcel Wolf

Joins und Schema-Validierung mit MongoDB 3.2

Mit Version 3.2 der dokumentenorientierten NoSQL-Datenbank MongoDB werden u.a. zwei lange vermisste(?) Features eingeführt, auf die ich im Folgenden näher eingehen möchte.JoinsDie logischen Namensräume, in denen man seine Dokumente ablegt, werden in...

NoSQL
Big Data
Validierung

7.12.2015 | 3 Minuten Lesezeit

Tobias Trelle

MongoDB-Einführung bei der Java-Usergruppe ruhrjug

Die Java-Enthusiasten im Ruhrgebiet treffen sich regelmäßig bei der ruhrjug , um sich über aktuelle Themen rund um die Programmiersprache Java auszutauschen.Beim letzten Treffen vor der Sommerpause am 25.06.2015 war ich eingeladen, um dort einen Vortrag...

Java
NoSQL
Community
Spring

1.7.2015 | 1 Minuten Lesezeit

Tobias Trelle

Cascaded Builder Pattern in Java

Wenn man mit dem Builder Pattern arbeitet, gelangt man an den Punkt, an dem man komplexe Objekte aufbauen muss. Nehmen wir nun an, dass wir ein Auto erzeugen möchten. Dieses besteht aus den Attributen Motor, Maschine und einer Anzahl Räder. Hierfür verwenden...

Java
Search

22.4.2015 | 6 Minuten Lesezeit

Sven Ruppert

Confess – Konferenzbericht

Von 14.-16.04.2015 fand die Confess, eine Konferenz für Enterprise Software Lösungen, statt. Sie wurde im C3 Convention Center in Wien veranstaltet. Auf der Konferenz waren hervorragende Speaker, wie Anton Arhipov, Maarten Mulders und Michael Plöd.Anton...

Community
Softwareentwicklung
NoSQL
Open Source
Java
Kubernetes
Microservices

21.4.2015 | 2 Minuten Lesezeit

Bernd Zuther

DataStax Tech-Day, die Zweite!

Vier Monate sind vergangen, seit wir den ersten Tech-Day gemeinsam mit unserem Partner DataStax in München durchgeführt hatten. Es war also an der Zeit, dieses Format auch in den hohen Norden, genauer gesagt in die Räumlichkeiten der codecentric nach...

NoSQL
Community

31.3.2015 | 2 Minuten Lesezeit

Silvio Tschapke

Big Data und Tiny Hardware – Teil 1

AbstractNachdem Ihr unsere „Big Data in a Box“-Lösung auf Schulungen und Usergroup-Treffen gesehen habt, haben wir immer wieder Anfragen zur Realisierung von Euch erhalten. Ihr wolltet wissen was wir dort gebaut haben und wie alles einzurichten ist. ...

Java
Open Source
Big Data
NoSQL

11.2.2015 | 3 Minuten Lesezeit

Dominique Ronde

MongoDB 2.8 – Neue Storage-Engine WiredTiger

Mit Version 2.8 kommen wesentliche Neuerungen auf die Benutzer der NoSQL-Datenbank MongoDB zu. Eine davon ist die Einführung einer weiteren Storage Engine. Was es damit auf sich hat, werde ich in diesem Artikel erläutern.Bis zur Version 2.6 hat MongoDB...

Big Data
NoSQL

10.12.2014 | 4 Minuten Lesezeit

Tobias Trelle

MongoDB – Riesige Datenmengen schemafrei verwalten

MongoDB ist eine dokumentenorientierte NoSQL-Datenbank, die sich steigender Beliebtheit erfreut. In meinem Artikel MongoDB – Riesige Datenmengen schemafrei verwalten aus dem Java Magazin 5.14 gebe ich eine allgemeine kurze Einführung und erläutere die...

Datenbank
NoSQL

10.7.2014 | 1 Minuten Lesezeit

Tobias Trelle

MongoDB Days München 2013

Am 14. Oktober fand in München zum 4. Mal die MongoDB Munich Konferenz statt. Dieses Jahr zog die Veranstaltung mit dem Hilton Hotel am Rosenheimer Platz an einen zentral gelegenen Ort an dem sich laut Veranstalter ca. 240 Anhänger der beliebten OpenSource...

NoSQL

15.10.2013 | 5 Minuten Lesezeit

Bastian Spanneberg

Einführung in Hadoop – Was ist Big Data & Hadoop? (Teil 1 von 3)

Was ist Big Data?„Big Data ist, wenn die Daten selbst Teil des Problems werden“Diese kurze Definition in Anlehnung an ein Zitat des Verantwortlichen für Marktforschung bei O’Reilly Media, Roger Magoulas, ist in meinen Augen die beste Charakterisierung...

Big Data
NoSQL

12.8.2013 | 5 Minuten Lesezeit

Uwe Printz

MongoDB und Ruby

#MongoDB #RubyAm vergangenen Samstag habe ich auf dem Cloud Developer Camp in Düsseldorf einen Vortrag über den Ruby-Treiber für MongoDB gehalten. Hier sind die Slides dazu:Klicken Sie auf den unteren Button, um den Inhalt von www.slideshare.net zu...

NoSQL
Ruby

18.7.2013 | 1 Minuten Lesezeit

Tobias Trelle

MongoDB für den Roboter

Wir setzen das Robot Framework seit geraumer Zeit für automatisierte Softwaretests in unseren Projekten ein. Außerdem beschäftigen sich ein paar meiner Kollegen mit der NoSql Datenbank MongoDB (Tutorial über MongoDB ). Die Dokumenten-Management-Lösung...

Agilität
Big Data
Open Source
NoSQL
Testing

6.6.2013 | 2 Minuten Lesezeit

Max Hartmann

OOP 2013: Praktische Einführung in MongoDB

Auf der OOP 2013 gab es von mir einen Vortrag zum Thema„Praktische Einführung in MongoDB“Klicken Sie auf den unteren Button, um den Inhalt von de.slideshare.net zu laden.Inhalt laden Wer wollte, konnte sich MongoDB herunterladen und die Beispiele live...

NoSQL
Community

1.2.2013 | 1 Minuten Lesezeit

Tobias Trelle

Oliver Gierke über Spring Data und den ganzen REST …

Heute mal was ganz anderes: ich führe ein Interview mit Oliver Gierke von SpringSource . Los geht’s …Tobias Trelle: Hallo Oliver. Möglicherweise gibt es Leser, die Dich noch nicht kennen. Könntest Du Dich bitte kurz vorstellen?Oliver Gierke: Mein Name...

Data
Java
Community
Datenbank
NoSQL
Spring

20.11.2012 | 9 Minuten Lesezeit

Tobias Trelle

Gemeinsam bessere Projekte umsetzen.

Wir helfen deinem Unternehmen.

Du stehst vor einer großen IT-Herausforderung? Wir sorgen für eine maßgeschneiderte Unterstützung. Informiere dich jetzt.

Hilf uns, noch besser zu werden.

Wir sind immer auf der Suche nach neuen Talenten. Auch für dich ist die passende Stelle dabei.

Contact

Send

Elasticsearch 101

Introduction

Running Elasticsearch

Install Head plugin

Indexing

Mapping

Getting data

Searching

Lucene under the hood

Clustering

Facets

That’s it!

Where to go from here

Inverview with Github about Elasticsearch at scale

Was this post helpful?

Ja

Blog author

Get in contact

Get in contact

Your job at codecentric?

Agile Developer und Consultant (w/d/m)

View Job

More articles in this subject area

kibconfig – Wartungstool für Kibana Dashboards

Graphen-Visualisierung mit Neo4j

Elasticsearch: _type-Mapping zur Dateninspektion

Spring Boot & Apache CXF – Logging & Monitoring mit Logback, Elasticsearch...

IoT-Analyse-Plattform

Elixir, Phoenix und CouchDB – Eine Einführung

Joins und Schema-Validierung mit MongoDB 3.2

MongoDB-Einführung bei der Java-Usergruppe ruhrjug

Cascaded Builder Pattern in Java

Confess – Konferenzbericht

DataStax Tech-Day, die Zweite!

Big Data und Tiny Hardware – Teil 1

MongoDB 2.8 – Neue Storage-Engine WiredTiger

MongoDB – Riesige Datenmengen schemafrei verwalten

MongoDB Days München 2013

Einführung in Hadoop – Was ist Big Data & Hadoop? (Teil 1 von 3)

MongoDB und Ruby

MongoDB für den Roboter

OOP 2013: Praktische Einführung in MongoDB

Oliver Gierke über Spring Data und den ganzen REST …

Gemeinsam bessere Projekte umsetzen.

Wir helfen deinem Unternehmen.

Unsere Leistungen

Hilf uns, noch besser zu werden.

Zu den Jobangeboten