Skip to content

Python Elastic Input

Overview

Compatible Pyspark only

This node is intended to be used when spark features are not required. For instance, in your pipeline, you are not manipulating Spark's dataframe at all...

The resulting output of this node is a list of python dictionary. Each dictionary is the result queried against an elasticsearch cluster

In contrast to classic input node, this one can be subscribed to a node outputting a list of string. Each element of the list will be use to query elasticsearch.

In case a query is defined by this node and at the same time it is subscribed to another one publishing a list of queries: the list of queries alongside the query set up on this node will be used on your elasticsearch cluster.

Example(s)

1: Multiple queries

Below is an example where our Python Elastic Input Node is subscribed to a node publishing a list of string and each element of the list is a query to be executed against an elasticsearch cluster. The result is then saved in a new index.

Note

Each line of your file should be a valid elasticsearch query

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
job: [
    {
        type: python_file_input
        component: queries
        publish: [
            {
                stream: data
            }
        ]
        settings: {
            file_path: /full/path/to/file/query
        }
    }
    {
        type: python_elastic_input
        component: python_elastic_input
        settings: {
            index: mydata
            nodes: [
                localhost
            ]
        }
        subscribe: [
            {
                stream: data
                component: queries
            }
        ]
        publish: [
            {
                stream: data
            }
        ]
    }
    {
        type: python_elastic_output
        component: python_elastic_output
        settings: {
            nodes: [
                localhost
            ]
            index: multiquerytest
        }
        subscribe: [
            {
                stream: data
                component: python_elastic_input
            }
        ]
    }
]

2: Single query

A more straight-forward solution. Only the settings specified on this node will be taken into account upon execution.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
{
    job: [
        {
            type: python_elastic_input
            component: python_elastic_input
            settings: {
                nodes: [
                    localhost
                ]
                index: platform-metricbeat-*
                query: {
                    query: {
                        match_all: {}
                    }
                }
            }
            publish: [
                {
                    stream: data
                }
            ]
        }
        {
            type: python_elastic_output
            component: python_elastic_output
            settings: {
                nodes: [
                    localhost
                ]
                index: singlequery
            }
            subscribe: [
                {
                    stream: data
                    component: python_elastic_input
                }
            ]
        }
    ]
}

Configuration(s)

  • index: String

    Description: [Required] The name of your elasticsearch index where data will be fetched.

  • port: Integer

    Description: [Optional] Your Elasticsearch server Port.

  • type: String

    Description: [Optional] Document type that will be retrieved from your elasticsearch index.

  • query: Json

    Description: [Optional] A valid Elasticsearch query.

  • nodes: List

    Description: [Required] Hostnames of your elasticsearch nodes. In general, only one hostname is needed.