Kibana best practices

This section deals with query performance optimization that can be made through Kibana. This is useful when the Elasticsearch database is loaded and/or has a huge amount of data. Following these best practices wil prevent you from overloading the database uselessly.

Globally, think before you type¶

Just like any database, the same query can be expressed by different ways, that can be more or less efficient. Even though Elasticsearch pre-computes the query a bit before submitting it, it does not prevent the user to craft unefficient requests. Furthermore by experience, thinking about query shaping enables the user to better understand what he really wants.

Don 't hesitate to factorize the requests: it costs less typing, the query is clearer, and obviously shorter:

init.usr.name: "dimi" OR init.usr.name: "geoff" OR init.usr.name: "ced" becomes : init.usr.name: ("dimi" OR "ced" OR "jeff");
`target.host.ip: 192.168.0.0/16 AND target.host.ip: 192.168.20.0/24 becomes : target.host.ip: 192.168.20.0/24;
`col.host.ip: 172.16.55.33 AND col.host.ip: 172.16.55.34 AND col.host.ip: 172.16.55.35 becomes : col.host.ip: [172.16.55.33 TO 172.16.55.35];

Avoid unnecessary filter matching tools¶

Every filters that are not exact match are consuming a much higher amount of charge when executed. Think about your request and consider one by one the following options, the higher the better:

Exact match are extremely powerful, for unanalyzed fields as well as analyzed ones (after all, unanalyzed fields are just indexed the same way as analyzed). Even using table filter is very efficient (e.g. `init.usr.name: ["dimi", "ced", "jeff"] ): some searches in Elasticsearch 2 yielded a result while looking for 1600 terms in 50 trillions (50,000,000,000,000) documents amongst 120 indexes in 24 nodes in ... 4.4 seconds;
Partial match covered by analysers are also powerful, when using analysis split (e.g. find The quick brown fox jumps over the lazy dog "), or IP type as well (e.g. find document matching 192.93.158.0/25);
Lucene globbing feature can be powerful, if wildcards are avoided ate the beginning of the filter (`init.usr.name: geoff*). It also works with analyzed fields.
Regexes are to be used as a very last resort. Lucene indexation is not handling these filters properly most of the time, especially when using variations at the beginning (e.g. `[0-9][0-9]zerty). See Kibana reference guide for more information.

Globally, th use of analyzers is encouraged: mind the request efficiency between *.server.tm and server.tm when the field is analyzed ...

Visualisation optimisations¶

Many query problems are caused by visualization querying. One way to test your visualization performance properly is to build it in a short time frame, and when you are ready, bench it in an uncached index, in other terms, an index that you did not queried for a long time (for instance, select a day 1 month ago).

Avoid analyzed fields in bucketing: Bucketing depends on number of values in the bucketed field ; and obviously an analyzed field contains much more values than a raw one.
Avoid unnecessary bucketing, especially fields: adding a buckets often means requesting an aggregation based upon a new field; Elasticsearch is compelled to load one more index field in RAM for your query. For instance, making a date histogram split by `vendor, then by init.usr.name and target.usr.name for 30 days makes Elasticsearch load 4 fields in RAM for each index (30 minimum), whats heavy. Is the field filtered in your query useful? clearly not. And this destination port that is always the same?

Dashboard handling¶

Although it is convenient to use dashboard to have a quick glance at your data, some good habits are useful when dealing with different data :

Obviously, when a visualization is monopolizing a lot of resources, there is no magic: the dashboard rendering it will be slow as well. It will quickly become exponentially worse if superposing several big visualizations in a sole dash;
More generally, consider the number of visualizations needed in a dashboard: will the user scroll across the dashboard? Else, the visualization below the window is not needed.
Mind the timeframe: the shorter the better, especially when the time frame is covering several indexes (7 days ago ...);
Finally, the refreshing time (especially for dashboards presented in a static device). Per second refresh rates is a bit high, prefer 10 ', hourly, even daily refresh rates.