Skip to content

File Output

Overview

file_output node enable you to save a spark dataframe on filesystem in one of the following format: csv, json, parquet, orc

Runtime Compatibility

  • PySpark :
  • Spark :

Example

The file_output node will save an incoming dataset to a file inside a specific folder.

---
type: punchline
version: '6.0'
runtime: spark
dag:
- type: file_output
  component: output
  settings:
    format: json
    folder_path: "."
    number_of_repartition: 1
    save_mode: overwrite
  subscribe:
  - component: input
    stream: data

Parameters

Common Settings

Name Type mandatory Default value Description
format String true NONE Codec that should be used to read the file content [json, csv, parquet, orc]
number_of_repartition Integer true NONE the number of repartition the file will be. Setting this option to one will give you a single file as result...
save_mode String true NONE the way data should be written to the desired output file (see advanced settings).
folder_path String true NONE Path where the result should be stored on your filesystem.
array_delimiter String false , when using csv format, concat array values by a delimiter in a single column.

Advanced Settings

output_mode Type Default value Description
overwrite String NONE Overwrite mode means that when saving a DataFrame to a data source, if data/table already exists, existing data is expected to be overwritten by the contents of the DataFrame.
append String NONE Append mode means that when saving a DataFrame to a data source, if data/table already exists, contents of the DataFrame are expected to be appended to existing data.
errorifexists String NONE ErrorIfExists mode means that when saving a DataFrame to a data source, if data already exists, an exception is expected to be thrown.
ignore String NONE Ignore mode means that when saving a DataFrame to a data source, if data already exists, the save operation is expected to not save the contents of the DataFrame and to not change the existing data.