Skip to content

File Output

Overview

file_output node enable you to save a spark dataframe on filesystem in one of the following format: csv, json, parquet, orc

Runtime Compatibility

  • PySpark :
  • Spark :

Example

The file_output node will save an incoming dataset to a file inside a specific folder.

{
    type: punchline
    version: "6.0"
    runtime: spark
    tenant: default
    dag: [
        {
            type: file_output
            component : output
            settings: {
                // the output format you want
                form
                // Location of the output file.
                folder_path: .
                // dataset repartitioning
                number_of_reparti
                // Overwrite, Append, AppendIfExists, Ignore
                // Check https://spark.apache.org/docs/2.java/org/apache/spark/sql/SaveMode.html
                save_mode: overwrite
            }
            subscribe: [
                {
                    component: input
                    stream: data
                }
            ]
        }
    ]
}

Parameters

Common Settings

Name Type mandatory Default value Description
format String true NONE Codec that should be used to read the file content [json, csv, parquet, orc]
number_of_repartition Integer true NONE the number of repartition the file will be. Setting this option to one will give you a single file as result...
save_mode String true NONE the way data should be written to the desired output file (see advanced settings).
folder_path String true NONE Path where the result should be stored on your filesystem.
array_delimiter String false , when using csv format, concat array values by a delimiter in a single column.

Advanced Settings

output_mode Type Default value Description
overwrite String NONE Overwrite mode means that when saving a DataFrame to a data source, if data/table already exists, existing data is expected to be overwritten by the contents of the DataFrame.
append String NONE Append mode means that when saving a DataFrame to a data source, if data/table already exists, contents of the DataFrame are expected to be appended to existing data.
errorifexists String NONE ErrorIfExists mode means that when saving a DataFrame to a data source, if data already exists, an exception is expected to be thrown.
ignore String NONE Ignore mode means that when saving a DataFrame to a data source, if data already exists, the save operation is expected to not save the contents of the DataFrame and to not change the existing data.