Skip to content

Elastic Query Stats

Compatible Spark

The elastic_query_stats node enable you to get stats about your elasticsearch query and load results in your spark cluster.

Example

Basic configuration

This configuration will output a dataframe with a single row with multiple columns containing the stats about Elasticsearch query.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
{
  job: [
    {
      type: elastic_query_stats
      component: input
      publish: [
        {
          stream: data
        }
      ]
      settings: {
        index: mytenant-events*
        nodes: [
          localhost
        ]
        count_value: true
        query: {
          query: {
            bool: {
              must: [
                {
                  range: {
                    @timestamp: {
                      gte: now-1h
                      lt: now
                    }
                  }
                }
              ]
            }
          }
          size: 0
        }
      }
    }
  ]
}

To make it clear, here is the corresponding elasticsearch query.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
curl -XGET "http://localhost:9200/mytenant-events*/_search" -H 'Content-Type: application/json' -d'
{
  {
  "query": {
    "bool": {
      "must": [
        {
          "range": {
            "@timestamp": {
              "gte": "now-1h",
              "lt": "now"
            }
          }
        }
      ]
    }
  },
  "size": 0
  }
}'

And instead of returning values for this request it retrieves stats about query : response time, total hits ..

Configuration(s)

  • index: String

    Description: [Required] The name of your elasticsearch index where data will be fetched.

  • port: Integer

    Description: [Optional] Your Elasticsearch server Port.

  • type: String

    Description: [Optional] Document type that will be retrieved from your elasticsearch index.

  • query: Json

    Description: [Optional] A valid Elasticsearch query.

  • nodes: List

    Description: [Required] Hostnames of your elasticsearch nodes. In general, only one hostname is needed.

  • count_value: Boolean

    Description: [Optional] By default is set to false. Set to true to get the total number of document in the selected index

    Warning

    If you activate this option, another request will be send to Elasticsearch to have total number of document in index. It could have impact Elasticsearch statistics

Return values :

This node output is a single dataset with the following columns :

  • component:

    Description: [Always] The component name.

  • count:

    Description: [Optional] Total number of document in selected index

  • hits.total:

    Description: [Always] Number of document matched by request

  • hits.max_score:

    Description: [Always] Elasticsearch score for request

  • query:

    Description: [Always] Request sent, useful for debugging

  • timestamp:

    Description: [Always] unix timestamp milliseconds

  • took:

    Description: [Always] Response time