Skip to content

File Transfer Output

Introduction

The file transfer output in contrast to the file output node enable you to transfer files from one end to another.

Supported destination output are:

  • S3
  • HDFS
  • FILESYSTEM

More will be added later...

For each file that were transferred, meta information of these files are published. These can be stored in a kafka topic for later data processing !

Note: it is expected that the received stream this not is subscribed to contains the field:

  • remote_file_last_modified_timestamp (long in seconds)

Example(s)

Example below illustrate how to fetch files from an SFTP server locally and send them to an S3 cluster with all sent files meta data stored in a kafka topic.

{
    version: "6.0"
    type: punchline
    runtime: storm
    dag: [
        {
            type: sftp_input
            component: input
            settings: {
                cleanup: false
                download_path: /tmp/toto
                download_ignore_suffix: ".complete"
                download_add_suffix: ".nm4"
                consume_mode: earliest
                sftp_settings: {
                    sftp.ssh.host: server.com
                    sftp.ssh.auth.user: user
                    sftp.ssh.auth.pass: pass
                    sftp.ssh.file.name_regex: "*.complete"
                    sftp.ssh.scan_directories: [
                        SAISData
                    ]
                }
                checkpoint_settings: {
                    checkpoint.application_runtime_id: sftp_application_id_test
                    checkpoint.es_index_prefix: mytenant-test-
                    checkpoint.es_nodes: [
                        {
                            host: localhost
                        }
                    ]
                }
            }
            publish: [
                {
                    stream: files
                    fields: [
                        meta
                    ]
                }
            ]
            subscribe: [

            ]
        }
        {
            type: punchlet_node
            component: punch
            settings: {
                punchlet_code: "{ print(root); }"
            }
            subscribe: [
                {
                    stream: files
                    component: input
                }
            ]
            publish: [
                {
                    stream: files
                    fields: [
                        meta
                    ]
                }
            ]   
        }
        {
            type: file_transfer_output
            component: output
            settings: {
                destination_folder: s3a://punch/transferred
                received_file_path: local_downloaded_file_path
                hadoop_settings: {
                    fs.s3a.access.key: minioadmin
                    fs.s3a.secret.key: minioadmin
                    fs.s3a.endpoint: http://127.0.0.1:9000
                }
            }
            subscribe: [
                {
                    stream: files
                    component: punch
                }
            ]
            publish: [
                {
                    stream: files
                    fields: [
                        meta
                    ]
                }
            ]
        }
        {
            type: kafka_output
            component: kafka_out
            settings: {
                topic: hello_world
                brokers: local
                encoding: json
                producer.acks: all
                producer.batch.size: 16384
                producer.linger.ms: 5
            }
            subscribe: [
                {
                    stream: files
                    component: output
                }
            ]
        }
    ]
}

Settings

Main Settings

Name Type Default Value Mandatory Description
destination_folder String None True Absolute path where data will be stored. Prefix used will determine the database abstraction layer. I.e s3a:// refers to S3
received_file_path String None True Stream this node is subscribed to must contain a field which match the value of this parameter. The matched field value should be an absolute path to a file locally
hadoop_settings Map K-V None False Not mandatory, but is required when you will want to set username and password