Skip to content

Elastic Query Stats

Overview

The elastic_query_stats node enable you to benchmark elasticsearch query. Useful statistics are returned for users to optimize their queries as best as possible.

This node output is a single dataset with the following columns :

  • component:

    [Always] The component name.

  • count:

    [Optional] Total number of document in selected index

  • hits.total:

    [Always] Number of document matched by request

  • hits.max_score:

    [Always] Elasticsearch score for request

  • query:

    [Always] Request sent, useful for debugging

  • timestamp:

    [Always] unix timestamp milliseconds

  • took:

    [Always] Response time

Runtime Compatibility

  • PySpark :
  • Spark :

Examples

---
type: punchline
version: '6.0'
runtime: spark
tenant: default
dag:
- type: elastic_query_stats
  component: input
  publish:
  - stream: data
  settings:
    index: mytenant-events*
    count_value: true
    query:
      query:
        bool:
          must:
          - range:
              "@timestamp":
                gte: now-1h
                lt: now

This configuration will output a dataframe with a single row with multiple columns containing the stats about Elasticsearch query.

Notice after executions that the returning values are statistical information on the query specified in your configuration file : response time, total hits...

Parameters

Common Settings

Name Type mandatory Default value Description
count_value Boolean false false Set to true to get the total number of document in the selected index. By activating this option, another request will be send to Elasticsearch to retrieve the total number of documents, statistical results may be biased in some circumstances.
index String true NONE The name of your elasticsearch index where data will be fetched. To add a document type, simply append /<type> to your index name.
port Integer false 9200 Your Elasticsearch server Port.
query String - Json false match all A valid Elasticsearch query.
nodes List of String true NONE Hostnames of your elasticsearch nodes. In general, only one hostname is needed.
elastic_settings str(K)-str(V) false NONE key-value arguments to control elasticsearch client

Advanced Settings

Elastic settings Type Default value Description
es.path.prefix String NONE /something/to/append in case your elastic servers are behind a proxy
es.size String 50 size of elastic query or size of each scroll query
es.scroll String false enable scrolling request
es.scroll.keepalive String 10m how long each scroll query should be kept alive, can be: 1m, 1d, 1y etc...
es.net.ssl String false enable ssl
es.net.http.auth.pass String NONE must be used with es.net.http.auth.user
es.net.http.auth.user String NONE must be used with es.net.http.auth.pass
es.net.http.auth.token String NONE must be used with es.net.http.auth.token_type
es.net.http.auth.token_type String NONE must be used with es.net.http.auth.token
es.net.ssl String false enable ssl
es.net.ssl.keystore.location String NONE must be a jks, pkcs12 or p12 store and must contain the private and the public key of the node
es.net.ssl.keystore.pass String NONE do not provide if the keystore is not protected with a password
es.net.ssl.truststore.location String NONE must be a jks, pkcs12 or p12 store and must contain at least the node certificate and its CA chain, and every other certificate this node should trust
es.net.ssl.truststore.pass String NONE do not provide if the truststore is not protected with a password
es.net.ssl.hostname.verification String true Whether the node client should resolve the nodes hostnames to IP addresses or not
es.max_concurrent_shard_requests String NONE set how max shards elastic_input node can request at a time
es.nodes.resolve.hostname String false resolve a hostname: be sure that /etc/hosts referenced the proper IP address
es.doc_type String NONE add doc_type to requested URI, this is a deprecated feature by Elastic