Python Elastic Input¶
Overview¶
Compatible Pyspark only
This node is intended to be used when spark features are not required. For instance, in your pipeline, you are not manipulating Spark's dataframe at all...
The resulting output of this node is a list of python dictionary. Each dictionary is the result queried against an elasticsearch cluster
In contrast to classic input node, this one can be subscribed to a node outputting a list of string. Each element of the list will be use to query elasticsearch.
In case a query is defined by this node and at the same time it is subscribed to another one publishing a list of queries: the list of queries alongside the query set up on this node will be used on your elasticsearch cluster.
Example(s)¶
1: Multiple queries¶
Below is an example where our Python Elastic Input Node
is subscribed to a node publishing a list of string and each element of the list is a query to be executed against an elasticsearch cluster. The result is then saved in a new index.
Note
Each line of your file should be a valid elasticsearch query
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 | job: [ { type: python_file_input component: queries publish: [ { stream: data } ] settings: { file_path: /full/path/to/file/query } } { type: python_elastic_input component: python_elastic_input settings: { index: mydata nodes: [ localhost ] } subscribe: [ { stream: data component: queries } ] publish: [ { stream: data } ] } { type: python_elastic_output component: python_elastic_output settings: { nodes: [ localhost ] index: multiquerytest } subscribe: [ { stream: data component: python_elastic_input } ] } ] |
2: Single query¶
A more straight-forward solution. Only the settings specified on this node will be taken into account upon execution.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 | { job: [ { type: python_elastic_input component: python_elastic_input settings: { nodes: [ localhost ] index: platform-metricbeat-* query: { query: { match_all: {} } } } publish: [ { stream: data } ] } { type: python_elastic_output component: python_elastic_output settings: { nodes: [ localhost ] index: singlequery } subscribe: [ { stream: data component: python_elastic_input } ] } ] } |
Configuration(s)¶
-
index
: StringDescription: [Required] The name of your elasticsearch index where data will be fetched.
-
port
: IntegerDescription: [Optional] Your Elasticsearch server Port.
-
type
: StringDescription: [Optional] Document type that will be retrieved from your elasticsearch index.
-
query
: JsonDescription: [Optional] A valid Elasticsearch query.
-
nodes
: ListDescription: [Required] Hostnames of your elasticsearch nodes. In general, only one hostname is needed.