File Output¶
Overview¶
file_output
node enable you to save a spark dataframe on filesystem in one of the following format: csv, json, parquet, orc
Runtime Compatibility¶
- PySpark : ✅
- Spark : ✅
Example¶
The file_output
node will save an incoming dataset to a file inside a specific folder.
---
type: punchline
version: '6.0'
runtime: spark
dag:
- type: file_output
component: output
settings:
format: json
folder_path: "."
number_of_repartition: 1
save_mode: overwrite
subscribe:
- component: input
stream: data
Parameters¶
Common Settings¶
Name | Type | mandatory | Default value | Description |
---|---|---|---|---|
format | String | true | NONE | Codec that should be used to read the file content [json, csv, parquet, orc] |
number_of_repartition | Integer | true | NONE | the number of repartition the file will be. Setting this option to one will give you a single file as result... |
save_mode | String | true | NONE | the way data should be written to the desired output file (see advanced settings). |
folder_path | String | true | NONE | Path where the result should be stored on your filesystem. |
array_delimiter | String | false | , | when using csv format, concat array values by a delimiter in a single column. |
Advanced Settings¶
output_mode | Type | Default value | Description |
---|---|---|---|
overwrite | String | NONE | Overwrite mode means that when saving a DataFrame to a data source, if data/table already exists, existing data is expected to be overwritten by the contents of the DataFrame. |
append | String | NONE | Append mode means that when saving a DataFrame to a data source, if data/table already exists, contents of the DataFrame are expected to be appended to existing data. |
errorifexists | String | NONE | ErrorIfExists mode means that when saving a DataFrame to a data source, if data already exists, an exception is expected to be thrown. |
ignore | String | NONE | Ignore mode means that when saving a DataFrame to a data source, if data already exists, the save operation is expected to not save the contents of the DataFrame and to not change the existing data. |