Skip to content

File Output

Overview

file_output node enable you to save a spark dataframe on filesystem in one of the following format: csv, json, parquet, orc

Runtime Compatibility

  • PySpark :
  • Spark :

Example

The file_output node will save an incoming dataset to a file inside a specific folder.

{
  type: punchline
  version: "6.0"
  runtime: spark
  tenant: default
  dag:
  [
    {
      type: file_output
      component: output
      settings:
      {
        format: json
        folder_path: .
        number_of_repartition: 1
        save_mode: overwrite
      }
      subscribe:
      [
        {
          component: input
          stream: data
        }
      ]
    }
  ]
}

Parameters

Common Settings

Name Type mandatory Default value Description
format String true NONE Codec that should be used to read the file content [json, csv, parquet, orc]
number_of_repartition Integer true NONE the number of repartition the file will be. Setting this option to one will give you a single file as result...
save_mode String true NONE the way data should be written to the desired output file (see advanced settings).
folder_path String true NONE Path where the result should be stored on your filesystem.
array_delimiter String false , when using csv format, concat array values by a delimiter in a single column.

Advanced Settings

output_mode Type Default value Description
overwrite String NONE Overwrite mode means that when saving a DataFrame to a data source, if data/table already exists, existing data is expected to be overwritten by the contents of the DataFrame.
append String NONE Append mode means that when saving a DataFrame to a data source, if data/table already exists, contents of the DataFrame are expected to be appended to existing data.
errorifexists String NONE ErrorIfExists mode means that when saving a DataFrame to a data source, if data already exists, an exception is expected to be thrown.
ignore String NONE Ignore mode means that when saving a DataFrame to a data source, if data already exists, the save operation is expected to not save the contents of the DataFrame and to not change the existing data.