Python Elastic Input¶
Overview¶
This node can be used for simple use cases where you do not need to manipulate dataframe APIs.
The resulting output of this node is a list of python dictionaries. Each dictionary is the result queried against an elasticsearch cluster.
In contrast to classic input node type, this one can subscribe to a node publishing a list of strings. Each element of the list will be used to query elasticsearch.
In case a query is defined by this node and at the same time it is subscribed to another one publishing a list of queries: the list of queries alongside the query set up on this node will be used on your elasticsearch cluster.
Runtime Compatibility¶
- PySpark : ✅
- Spark : ❌
Examples¶
Basic¶
---
type: punchline
version: '6.0'
runtime: pyspark
dag:
- type: python_elastic_input
component: python_elastic_input
settings:
index: platform-metricbeat-*
query:
query:
match_all: {}
publish:
- stream: data
- type: python_elastic_output
component: python_elastic_output
settings:
index: singlequery
subscribe:
- stream: data
component: python_elastic_input
Custom Fields Selection¶
This node can also be configured in order to retrieve only specific fields from Elasticsearch response. Additionnal fields like for example a timestamp or the number of document in requested index can also be retrieved. This feature is usefull if you want to bench Elasticsearch.
---
type: punchline
version: '6.0'
runtime: pyspark
dag:
- type: python_file_input
component: queries
publish:
- stream: data
settings:
file_path: "/full/path/to/file/query"
- type: python_elastic_input
component: python_elastic_input
settings:
index: mydata
nodes:
- localhost
subscribe:
- stream: data
component: queries
publish:
- stream: data
- type: python_elastic_output
component: python_elastic_output
settings:
nodes:
- localhost
index: multiquerytest
subscribe:
- stream: data
component: python_elastic_input
Parameters¶
Name | Type | mandatory | Default value | Description |
---|---|---|---|---|
index | String | true | NONE | The name of your elasticsearch index where data will be fetched. |
port | Integer | false | 9200 | Your Elasticsearch server Port. |
query | String - Json | false | match all | A valid Elasticsearch query. |
nodes | List of String | true | NONE | Hostnames of your elasticsearch nodes. In general, only one hostname is needed. |
type | String | false | NONE | document type that will be retrieved from elasticsearch index |
timestamp_field | Boolean | false | false | adds a timestamp field to your json document. |
output_fields | List of String | false | NONE | List of fields retrieved from Elasticsearch response. ie hits.hits, etc... |
count_field | Boolean | false | false | If true, add a count field in response which count total number of document in index. |
node_id | String | false | NONE | If set, add a node_id field in response which set a id to current node. Must be unique in order to differentiate requests in visualization. |