Skip to content

Elastic Query Stats

Before you start...

Before using...

The elastic_query_stats node enable you to benchmark elasticsearch query. Useful statistics are returned for users to optimize their queries as best as possible.

Statistical and meta values returned:

This node output is a single dataset with the following columns :

  • component:

    Description: [Always] The component name.

  • count:

    Description: [Optional] Total number of document in selected index

  • hits.total:

    Description: [Always] Number of document matched by request

  • hits.max_score:

    Description: [Always] Elasticsearch score for request

  • query:

    Description: [Always] Request sent, useful for debugging

  • timestamp:

    Description: [Always] unix timestamp milliseconds

  • took:

    Description: [Always] Response time

PySpark ->

Spark ->

Examples

Use-cases

Our "hello world" punchline configuration.

beginner_use_case.punchline

{
  type: punchline
  version: "6.0"
  runtime: spark
  tenant: default
  dag: [
    {
      type: elastic_query_stats
      component: input
      publish: [
        {
          stream: data
        }
      ]
      settings: {
        index: mytenant-events*
        nodes: [
          localhost
        ]
        count_value: true
        query: {
          query: {
            bool: {
              must: [
                {
                  range: {
                    @timestamp: {
                      gte: now-1h
                      lt: now
                    }
                  }
                }
              ]
            }
          }
        }
      }
    }
  ]
}

This configuration will output a dataframe with a single row with multiple columns containing the stats about Elasticsearch query.

Check it out with the instructions below:

CONF=beginner_use_case.punchline
punchlinectl start -p $CONF

Notice after executions that the returning values are statistical information on the query specified in your configuration file : response time, total hits...

Comming soon

Comming soon

Parameters

Common Settings

Name Type mandatory Default value Description
count_value Boolean false false Set to true to get the total number of document in the selected index. By activating this option, another request will be send to Elasticsearch to retrieve the total number of documents, statistical results may be biased in some circumstances.
index String true NONE The name of your elasticsearch index where data will be fetched. To add a document type, simply append /<type> to your index name.
port Integer false 9200 Your Elasticsearch server Port.
query String - Json false match all A valid Elasticsearch query.
nodes List of String true NONE Hostnames of your elasticsearch nodes. In general, only one hostname is needed.
elastic_settings str(K)-str(V) false NONE key-value arguments to control elasticsearch client

Advanced Settings

Elastic settings Type Default value Description
es.path.prefix String NONE /something/to/append in case your elastic servers are behind a proxy
es.size String 50 size of elastic query or size of each scroll query
es.scroll String false enable scrolling request
es.scroll.keepalive String 10m how long each scroll query should be kept alive, can be: 1m, 1d, 1y etc...
es.net.ssl String false enable ssl
es.net.http.auth.pass String NONE must be used with es.net.http.auth.user
es.net.http.auth.user String NONE must be used with es.net.http.auth.pass
es.net.http.auth.token String NONE must be used with es.net.http.auth.token_type
es.net.http.auth.token_type String NONE must be used with es.net.http.auth.token
es.max_concurrent_shard_requests String NONE set how max shards elastic_input node can request at a time
es.nodes.resolve.hostname String false resolve a hostname: be sure that /etc/hosts referenced the proper IP address
es.doc_type String NONE add doc_type to requested URI, this is a deprecated feature by Elastic