Skip to content

File Output

Before you start...

Before using...

file_output node enable you to save a spark dataframe on filesystem in one of the following format: csv, json, parquet, orc

Pyspark ->

Spark ->

Examples

Use-cases

Our "hello world" punchline configuration.

beginner_use_case.punchline

The file_output node will save an incoming dataset to a file inside a specific folder.

{
    type: punchline
    version: "6.0"
    runtime: spark
    tenant: default
    dag: [
        {
            type: file_output
            component : output
            settings: {
                // the output format you want
                format: csv

                // Location of the output file.
                folder_path: ./folder

                // dataset repartitioning
                number_of_repartition: 1

                // Overwrite, Append, AppendIfExists, Ignore
                // Check https://spark.apache.org/docs/2.4.3/api/java/org/apache/spark/sql/SaveMode.html
                save_mode: overwrite
            }
            subscribe: [
                {
                    component: input
                    stream: data
                }
            ]
        }
    ]
}

run beginner_use_case.punchline by using the command below:

CONF=beginner_use_case.punchline
punchlinectl start -p $CONF

Comming soon

Comming soon

Parameters

Common Settings

Name Type mandatory Default value Description
format String true NONE Codec that should be used to read the file content [json, csv, parquet, orc]
number_of_repartition Integer true NONE the number of repartition the file will be. Setting this option to one will give you a single file as result...
save_mode String true NONE the way data should be written to the desired output file (see advanced settings).
folder_path String true NONE Path where the result should be stored on your filesystem.
array_delimiter String false , when using csv format, concat array values by a delimiter in a single column.

Advanced Settings

output_mode Type Default value Description
overwrite String NONE Overwrite mode means that when saving a DataFrame to a data source, if data/table already exists, existing data is expected to be overwritten by the contents of the DataFrame.
append String NONE Append mode means that when saving a DataFrame to a data source, if data/table already exists, contents of the DataFrame are expected to be appended to existing data.
errorifexists String NONE ErrorIfExists mode means that when saving a DataFrame to a data source, if data already exists, an exception is expected to be thrown.
ignore String NONE Ignore mode means that when saving a DataFrame to a data source, if data already exists, the save operation is expected to not save the contents of the DataFrame and to not change the existing data.