Skip to content

Python File Input

Overview

Compatible Pyspark only

This node is intended to be used when spark features are not required. For instance, in your pipeline, you are not manipulating Spark's dataframe at all...

The resulting output of this node is a list of string where each element is a line of your file.

Example(s)

1: Multiple queries

Below is an example where our Python Elastic Input Node is subscribed to our Python File Input Node, publishing a list of string and each element of the list is a query to be executed against an elasticsearch cluster. The result is then saved in a new index.

Note

Each line of your file should be a valid elasticsearch query

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
job: [
    {
        type: python_file_input
        component: queries
        publish: [
            {
                stream: data
            }
        ]
        settings: {
            file_path: /full/path/to/file/query
        }
    }
    {
        type: python_elastic_input
        component: python_elastic_input
        settings: {
            index: mydata
            nodes: [
                localhost
            ]
        }
        subscribe: [
            {
                stream: data
                component: queries
            }
        ]
        publish: [
            {
                stream: data
            }
        ]
    }
    {
        type: python_elastic_output
        component: python_elastic_output
        settings: {
            nodes: [
                localhost
            ]
            index: multiquerytest
        }
        subscribe: [
            {
                stream: data
                component: python_elastic_input
            }
        ]
    }
]

Configuration(s)

  • file_path: String

    Description: [Required] full path the file you want to ingest.