Skip to content

File Output

The file_output node will save an incoming dataset to a file inside a specific folder.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
{
    job: [
        {
            type: file_output
            component : output
            settings: {
                // the output format you want
                format: csv

                // Location of the output file.
                folder_path: ./folder

                // dataset repartitioning
                number_of_repartition: 1

                // Overwrite, Append, AppendIfExists, Ignore
                // Check https://spark.apache.org/docs/2.4.0/api/java/org/apache/spark/sql/SaveMode.html
                save_mode: overwrite
            }
            subscribe: [
                {
                    component: input
                    stream: data
                }
            ]
        }
    ]
}

Configuration(s)

  • format: String

    Description: [Required] the format the node should use for outputting data from the inputted dataset [json, csv, parquet, orc].

  • folder_path: String

    Description: [Required] path where the result should be stored on your filesystem.

  • number_of_repartition: Integer

    Description: [Required] the number of repartition the file will be. Setting this option to one will give you a single file as result...

  • save_mode: String

    Description: [Required] the way data should be written to the desired output file:

    1
    2
    3
    4
    5
    6
    7
    1) overwrite: Overwrite mode means that when saving a DataFrame to a data source, if data/table already exists, existing data is expected to be overwritten by the contents of the DataFrame.
    
    2) append: Append mode means that when saving a DataFrame to a data source, if data/table already exists, contents of the DataFrame are expected to be appended to existing data.
    
    3) errorifexists: ErrorIfExists mode means that when saving a DataFrame to a data source, if data already exists, an exception is expected to be thrown.
    
    4) ignore: Ignore mode means that when saving a DataFrame to a data source, if data already exists, the save operation is expected to not save the contents of the DataFrame and to not change the existing data.