Elastic Query Stats¶
Overview¶
The elastic_query_stats node enable you to benchmark elasticsearch query. Useful statistics are returned for users to optimize their queries as best as possible.
This node output is a single dataset with the following columns :
component
:[Always] The component name.
count
:[Optional] Total number of document in selected index
hits.total
:[Always] Number of document matched by request
hits.max_score
:[Always] Elasticsearch score for request
query
:[Always] Request sent, useful for debugging
timestamp
:[Always] unix timestamp milliseconds
took
:[Always] Response time
Runtime Compatibility¶
- PySpark : ❌
- Spark : ✅
Examples¶
---
type: punchline
version: '6.0'
runtime: spark
tenant: default
dag:
- type: elastic_query_stats
component: input
publish:
- stream: data
settings:
index: mytenant-events*
count_value: true
query:
query:
bool:
must:
- range:
"@timestamp":
gte: now-1h
lt: now
This configuration will output a dataframe with a single row with multiple columns containing the stats about Elasticsearch query.
Notice after executions that the returning values are statistical information on the query specified in your configuration file : response time, total hits...
Parameters¶
Common Settings¶
Name | Type | mandatory | Default value | Description |
---|---|---|---|---|
count_value | Boolean | false | false | Set to true to get the total number of document in the selected index. By activating this option, another request will be send to Elasticsearch to retrieve the total number of documents, statistical results may be biased in some circumstances. |
index | String | true | NONE | The name of your elasticsearch index where data will be fetched. To add a document type, simply append /<type> to your index name. |
port | Integer | false | 9200 | Your Elasticsearch server Port. |
query | String - Json | false | match all | A valid Elasticsearch query. |
nodes | List of String | true | NONE | Hostnames of your elasticsearch nodes. In general, only one hostname is needed. |
elastic_settings | str(K)-str(V) | false | NONE | key-value arguments to control elasticsearch client |
Advanced Settings¶
Elastic settings | Type | Default value | Description |
---|---|---|---|
es.path.prefix | String | NONE | /something/to/append in case your elastic servers are behind a proxy |
es.size | String | 50 | size of elastic query or size of each scroll query |
es.scroll | String | false | enable scrolling request |
es.scroll.keepalive | String | 10m | how long each scroll query should be kept alive, can be: 1m, 1d, 1y etc... |
es.net.ssl | String | false | enable ssl |
es.net.http.auth.pass | String | NONE | must be used with es.net.http.auth.user |
es.net.http.auth.user | String | NONE | must be used with es.net.http.auth.pass |
es.net.http.auth.token | String | NONE | must be used with es.net.http.auth.token_type |
es.net.http.auth.token_type | String | NONE | must be used with es.net.http.auth.token |
es.net.ssl | String | false | enable ssl |
es.net.ssl.keystore.location | String | NONE | must be a jks , pkcs12 or p12 store and must contain the private and the public key of the node |
es.net.ssl.keystore.pass | String | NONE | do not provide if the keystore is not protected with a password |
es.net.ssl.truststore.location | String | NONE | must be a jks , pkcs12 or p12 store and must contain at least the node certificate and its CA chain, and every other certificate this node should trust |
es.net.ssl.truststore.pass | String | NONE | do not provide if the truststore is not protected with a password |
es.net.ssl.hostname.verification | String | true | Whether the node client should resolve the nodes hostnames to IP addresses or not |
es.max_concurrent_shard_requests | String | NONE | set how max shards elastic_input node can request at a time |
es.nodes.resolve.hostname | String | false | resolve a hostname: be sure that /etc/hosts referenced the proper IP address |
es.doc_type | String | NONE | add doc_type to requested URI, this is a deprecated feature by Elastic |