Kibana best practices
This section deals with query performance optimization that can be made through Kibana. This is useful when the Elasticsearch database is loaded and/or has a huge amount of data. Following these best practices wil prevent you from overloading the database uselessly.
Globally, think before you type¶
Just like any database, the same query can be expressed by different ways, that can be more or less efficient. Even though Elasticsearch pre-computes the query a bit before submitting it, it does not prevent the user to craft unefficient requests. Furthermore by experience, thinking about query shaping enables the user to better understand what he really wants.
Don 't hesitate to factorize the requests: it costs less typing, the query is clearer, and obviously shorter:
init.usr.name: "dimi" OR init.usr.name: "geoff" OR init.usr.name: "ced"becomes :
init.usr.name: ("dimi" OR "ced" OR "jeff");
`target.host.ip: 192.168.0.0/16 AND target.host.ip: 192.168.20.0/24becomes :
`col.host.ip: 172.16.55.33 AND col.host.ip: 172.16.55.34 AND col.host.ip: 172.16.55.35becomes :
col.host.ip: [172.16.55.33 TO 172.16.55.35];
Avoid unnecessary filter matching tools¶
Every filters that are not exact match are consuming a much higher amount of charge when executed. Think about your request and consider one by one the following options, the higher the better:
- Exact match are extremely powerful, for unanalyzed fields as well
as analyzed ones (after all, unanalyzed fields are just indexed the
same way as analyzed). Even using table filter is very efficient
`init.usr.name: ["dimi", "ced", "jeff"]): some searches in Elasticsearch 2 yielded a result while looking for 1600 terms in 50 trillions (50,000,000,000,000) documents amongst 120 indexes in 24 nodes in ... 4.4 seconds;
- Partial match covered by analysers are also powerful, when using analysis split (e.g. find The quick brown fox jumps over the lazy dog "), or IP type as well (e.g. find document matching 18.104.22.168/25);
- Lucene globbing feature can be powerful, if wildcards are avoided
ate the beginning of the filter (
`init.usr.name: geoff*). It also works with analyzed fields.
- Regexes are to be used as a very last resort. Lucene indexation is
not handling these filters properly most of the time, especially
when using variations at the beginning (e.g.
`[0-9][0-9]zerty). See Kibana reference guide for more information.
Globally, th use of analyzers is encouraged: mind the request efficience
server.tm when the field is analyzed ...
Many query problems are caused by visualization querying. One way to test your visualization performance properly is to build it in a short time frame, and when you are ready, bench it in an uncached index, in other terms, an index that you did not queried for a long time (for instance, select a day 1 month ago).
- Avoid analyzed fields in bucketing: Bucketing depends on number of values in the bucketed field ; and obviously an analyzed field contains much more values than a raw one.
- Avoid unnecessary bucketing, especially fields: adding a buckets
often means requesting an aggregation based upon a new field;
Elasticsearch is compelled to load one more index field in RAM for
your query. For instance, making a date histogram splitted by
`vendor, then by
target.usr.namefor 30 days makes Elasticsearch load 4 fields in RAM for each index (30 minimum), whats heavy. Is the field filtered in your query useful? clearly not. And this destination port that is always the same?
Although it is convenient to use dashboard to have a quick glance at your datas, some good habits are useful when dealing with different datas :
- Obviously, when a visualization is monopolizing a lot of resources, there is no magic: the dashboard rendering it will be slow as well. It will quickly become exponentially worse if superposing several big visualizations in a sole dash;
- More generally, consider the number of visualizations needed in a dashboard: will the user scroll across the dashboard? Else, the visualization below the window is not needed.
- Mind the timeframe: the shorter the better, especially when the time frame is covering several indexes (7 days ago ...);
- Finally, the refreshing time (especially for dashboards presented in a static device). Per second refresh rates is a bit high, prefer 10 ', hourly, even daily refresh rates.