Skip to content

Track 1 Indexing Fundamentals

Abstract

This track covers the fundamentals of indexing - i.e. creating documents - in ElasticSearch and taking control of how the documents contents are stored through mappings.

The basics

Why indexing ?

Refer to the ElasticSearch inverted index guide.

Tip

ElasticSearch is a document-oriented database : you can create, delete, analyze documents... but mostly search them. You can do so much with ElasticSearch in terms of storing and such, but its reason to be is to search within your data.

ElasticSearch is a search engine that works just like the index of a book ; when you need to search for a particular term in a textbook for example, the most efficient way is to look for the term in the "index" section of the textbook, where the words are listed along the page numbers. ElasticSearch relies on a software called Lucene that map words to the actual document locations of where they occur, just like the "index" section of the textbook, and the underlying data structure is called an inverted index.

Question

Which task takes the longest, indexing or querying ? Why ?

Tip

Preparing and indexing data, i.e. splitting words into tokens, lower-casing them... and putting them into an inverted index, is the longest task but is what makes the search so fast.

Refer to the ElasticSearch getting started.

Question

What happens if you re-index the exact same document ? And if you re-index a document after changing the value of a field ?

Documents are immutable, so a new version is always created.

Question

When you delete a document, is it immediately deleted from the disk ?

When we delete a document, we can't read / access it anymore, but it still exists somewhere, so the disk space doesn't free up immediately. ElasticSearch marks them as deleted, and once in a while goes through all the documents marked as deleted and wipes them off the disk.

Index mappings

Refer to the ElasticSearch mapping documentation.

Question

Can you prevent dynamic mapping ?

See the following exercise.

Exercise: dynamic mapping

Open the Kibana DevTools and check the mapping of the index platform-logs-YYYY.MM.DD, YYYY.MM.DD being the current date :

GET /platform-logs-YYYY.MM.DD

You can see the mapping is dynamic with the line "dynamic" : "true". All the fields of the index are listed within mappings.properties.

Now, let's try to add a field that doesn't exist in the mapping, like track1_with_dynamic :

PUT /platform-logs-2020.10.01/_doc/track1
{
    "track1_with_dynamic": "i love punchplatform",
    "@timestamp": "2020-10-01T12:59:02.463Z",
    "content": {
      "event_type": "application_running",
      "level": "INFO",
      "logger": "org.thales.punch.apps.shiva.worker.impl.WorkerImpl",
      "message": "ever running application is running",
      "shiva_app_name": "tenants/platform/channels/monitoring/local_events_dispatcher"
    },
    "init": {
      "host": {
        "name": "punchplatform"
      },
      "process": {
        "id": "29885@punchplatform",
        "name": "shiva_worker"
      },
      "user": {
        "name": "trainee"
      }
    },
    "platform": {
      "application": "local_events_dispatcher",
      "channel": "monitoring",
      "id": "standalone",
      "tenant": "platform"
    },
    "target": {
      "cluster": {
        "name": "local"
      },
      "type": {
        "name": "shiva"
      }
    },
    "type": "punch",
    "vendor": "thales",
    "es_ts": "2020-10-01T14:59:02.469+02:00"
  }

Check again the mapping of the index. Can you see the newly created field ?

Now, let's disable the dynamic mapping. Use the /_mapping endpoint to do so :

PUT /platform-logs-YYYY.MM.DD/_mapping
{
 "dynamic": false
}

And try to add another document with the ID track-without-dynamic and a new field called track1_without_dynamic. What happens ? Has the mapping of the index changed ?

Check the content of the document with the following request :

GET /platform-logs-YYYY.MM.DD/_doc/track-without-dynamic

We can see the field track1_without_dynamic appears because it has only been ignored in the mapping. To ensure that no new fields can be added, you can set the dynamic mapping to "strict" :

PUT /platform-logs-YYYY.MM.DD/_mapping
{
 "dynamic": "strict"
}

Now create a new document with a new field. What happens ?

An error is thrown because of the strict mapping and the document isn't created.

Analyzers in mappings

Tip

If you've got textual data, you can specify the way this data is processed into ElasticSearch by defining a text analyzer into the mapping.

Refer to the ElasticSearch analyzer documentation.