Skip to content

Elastalert

Presentation

ElastAlert is a open source project from Yelp. It is an Elasticsearch query wrapper: it digests YAML alerting rules and search for patterns in your Elasticsearch database. It lets you fire alerts should some conditions be matched by the rules. Refer to the Elastalert documentation.

Rules types

Mode Description
Any The any rule will match everything. Every hit that the query returns will generate an alert.
Blacklist The blacklist rule will check a certain field against a blacklist, and match if it is in the blacklist.
Whitelist Similar to blacklist, this rule will compare a certain field to a whitelist, and match if the list does not contain the term.
Change This rule will monitor a certain field and match if that field changes. The field must change with respect to the last event with the same query_key.
Frequency This rule matches when there are at least a certain number of events in a given time frame. This may be counted on a per-query_key basis.
Spike This rule matches when the volume of events during a given time period is spike_height times larger or smaller than during the previous time period. It uses two sliding windows to compare the current and reference frequency of events. We will call this two windows "reference" and "current".
Flatline This rule matches when the total number of events is under a given threshold for a time period.
New Term This rule matches when a new value appears in a field that has never been seen before. When ElastAlert starts, it will use an aggregation query to gather all known terms for a list of fields.
Cardinality This rule matches when a the total number of unique values for a certain field within a time frame is higher or lower than a threshold.
Metric Aggregation This rule matches when the value of a metric within the calculation window is higher or lower than a threshold. By default this is buffer_time.
Spike Aggregation This rule matches when the value of a metric within the calculation window the calculation window is spike_height times larger or smaller than during the previous time period.
Percentage Match This rule matches when the percentage of document in the match bucket within a calculation window is higher or lower than a threshold. By default the calculation window is buffer_time.

Connector list

Elastalert connector
Alert Subject
Alert Content
Command
Email
Jira
OpsGenie
SNS
HipChat
Stride
MS Teams
Slack
Mattermost
Telegram
GoogleChat
PagerDuty
PagerTree
Exotel
Twilio

Setup alerts in PunchPlatform

Developement mode

If you want to test a rule or to just try Elastalert in a development mode, check this guide

Info

This feature is only available in standalone

Production-ready mode

The PunchPlatform administrates Elastalert using Shiva in a production mode to ensure high availability.

You must create your own channel with your Elastalert configuration and your specific rules. The following folder layout highlights the Elastalert related resource files in a channel :

elastalert_channel
└── channel_structure
└── myconfig.yaml
└── rules
    └── /** custom rules **/        

Take a look at Channel configuration documentation to have more details about channels configuration

Examples

Generate alerts when the crossing time is too high

All the entry point (spouts) generates latency metrics. Supervising theses metrics is a good monitoing use case.

# (Required)
# Rule name, must be unique
name: The crossing time is too high

# This option allows you to ignore repeating alerts for a period of time
realert:
  minutes: 17

# This option causes the value of realert to exponentially increase while alerts continue to fire.
exponential_realert:
  hours: 1

# timestamp settings
timestamp_field: _timestamp
timestamp_type: iso

# (Required)
# Type of alert.
# The metric_aggregation rule matches when the value of a metric within the calculation window is higher or lower than a threshold. By default this is buffer_time.
type: metric_aggregation

# (Required)
# Index to search, wildcard supported
index: metrics-*

# (Required, metric_aggregation specific)
# This is the name of the field over which the metric value will be calculated. The underlying type of this field must be supported by the specified aggregation type.
metric_agg_key: autotest_latency.ts_diff

# (Required, metric_aggregation specific)
# Group metric calculations by this field. For each unique value of the query_key field, the metric will be calculated and evaluated separately against the threshold(s).
query_key: tags.autotest_latency_path.start

# (Required, metric_aggregation specific)
# The type of metric aggregation to perform on the metric_agg_key field. This must be one of ‘min’, ‘max’, ‘avg’, ‘sum’, ‘cardinality’, ‘value_count’.
metric_agg_type: max

# (Required, metric_aggregation specific)
# Specify the _type of document to search for.
doc_type: autotest_latency

# (Required, metric_aggregation specific)
# If the calculated metric value is greater than this number, an alert will be triggered. This threshold is exclusive.
max_threshold: 5000

# (Optional, metric_aggregation specific)
# This setting will only have an effect if use_run_every_query_size is false and buffer_time is greater than run_every. If true will allow the start of the metric calculation window to overlap the end time of a previous run. By default the start and end times will not overlap, so if the time elapsed since the last run is less than the metric calculation window size, rule execution will be skipped (to avoid calculations on partial data).
allow_buffer_time_overlap: true

# (Required)
# The alert is use when a match is found
alert:
- command

command:
  - "/bin/echo"
  - "Alert!!!"

Generate per vendor alerts when the traffic changes

Here is a rule that sends alerts when the number of EPS changes.

# (Required)
# Rule name, must be unique
name: The number of EPS has changed a lot

# This option allows you to ignore repeating alerts for a period of time
realert:
  minutes: 17

# This option causes the value of realert to exponentially increase while alerts continue to fire.
exponential_realert:
  hours: 1

# timestamp settings
timestamp_field: _timestamp
timestamp_type: iso

# (Required)
# Type of alert.
# The blacklist rule matches when a certain field against a blacklist, and match if it is in the blacklist.
type: spike

# (Required)
# Index to search, wildcard supported
index: events-*

# (Required, spike specific)
# spike_height: The ratio of number of events in the last timeframe to the previous timeframe that when hit will trigger an alert.
spike_height: 2

# (Required, spike specific)
# Either ‘up’, ‘down’ or ‘both’. ‘Up’ meaning the rule will only match when the number of events is spike_height times higher. ‘Down’ meaning the reference number is spike_height higher than the current number. ‘Both’ will match either.
spike_type: both

# (Required, spike specific)
# The rule will average out the rate of events over this time period. For example, hours: 1 means that the ‘current’ window will span from present to one hour ago, and the ‘reference’ window will span from one hour ago to two hours ago. The rule will not be active until the time elapsed from the first event is at least two timeframes. This is to prevent an alert being triggered before a baseline rate has been established. This can be overridden using alert_on_new_data.
timeframe:
  minutes: 1

# The minimum number of events that must exist in the reference window for an alert to trigger. For example, if spike_height: 3 and threshold_ref: 10, than the ‘reference’ window must contain at least 10 events and the ‘current’ window at least three times that for an alert to be triggered.
threshold_ref: 10

# This option is only used if query_key is set. When this is set to true, any new query_key encountered may trigger an immediate alert. When set to false, baseline must be established for each new query_key value, and then subsequent spikes may cause alerts. Baseline is established after timeframe has elapsed twice since first occurrence.
alert_on_new_data: true

# Specify the _type of document to search for. This must be present if use_count_query or use_terms_query is set.
doc_type: log

#  If true, ElastAlert will make an aggregation query against Elasticsearch to get counts of documents matching each unique value of query_key. This must be used with query_key and doc_type. This will only return a maximum of terms_size, default 50, unique terms.
use_terms_query: true

# Counts of documents will be stored independently for each value of query_key.
query_key: vendor


# (Required)
# The alert is use when a match is found
alert:
- command

command:
  - "/bin/echo"
  - "Alert!!!"

Generate alerts when logs are not parsed

From a security point of view, monitoring unparsed log is very important. Here is a rule doing that.

# (Required)
# Rule name, must be unique
name: At least one log is not properly parsed

# This option allows you to ignore repeating alerts for a period of time
realert:
  minutes: 17

# This option causes the value of realert to exponentially increase while alerts continue to fire.
exponential_realert:
  hours: 1

# timestamp settings
timestamp_field: ts
timestamp_type: iso

# (Required)
# Type of alert.
# The blacklist rule matches when a certain field against a blacklist, and match if it is in the blacklist.
type: blacklist

# (Required)
# Index to search, wildcard supported
index: events-*

# (Required, blacklist specific)
# The name of the field to use to compare to the blacklist. If the field is null, those events will be ignored.
compare_key: type

# (Required, blacklist specific)
# A list of blacklisted values. The compare_key term must be equal to one of these values for it to match.
blacklist: 
  - "error"

# (Required)
# The alert is use when a match is found
alert:
- command

command:
  - "/bin/echo"
  - "Alert!!!"