Skip to content

SFTP Input

Introduction

This node can be use to fetch files through ssh protocol. You can download files to a specific path and persist them or download them temporarily to a temporary directory.

Most of this node parameters are similar to what you are used on an sshd server by using the key:

  • sftp.session or
  • sftp.channel

Example(s)

{
    version: "6.0"
    type: punchline
    runtime: storm
    dag: [
        {
            type: sftp_input
            component: input
            settings: {
                cleanup: false
                download_path: /tmp/toto
                sftp_settings: {
                    sftp.ssh.host: mysftphost.com
                    sftp.ssh.auth.user: auser
                    sftp.ssh.auth.pass: apass
                    sftp.ssh.file.name_regex: "*.txt"
                    sftp.ssh.scan_directories: [
                        dir1,
                        dir2
                    ]
                }
            }
            publish: [
                {
                    stream: files
                    fields: [
                        meta
                    ]
                }
            ]
            subscribe: [

            ]
        }
        {
            type: punchlet_node
            component: stdout
            settings: {
                punchlet_code: "{ print(root); }"
            }
            subscribe: [
                {
                    stream: files
                    component: input
                }
            ]
            publish: [
                {
                    stream: files
                    fields: [
                        meta
                    ]
                }
            ]   
        }
    ]
}

Settings

Main Settings

Name Type Default Value Mandatory Description
sftp_settings Map (k-v) None true A list of key(string)-value(string) parameters. See SFTP_SETTINGS section below for more information
cleanup boolean true false Wipe all downloaded path after punchline finished or failed
download_path String System tmp dir false Specify an absolute path in which this node will use as storage
download_ignore_suffix String None false When files will be retrieved, you can ignore a suffix string of the real path of the file, normally used in together with download_add_suffix
download_add_suffix String None false When files will be retrieved and download_ignore_suffix is set, you can append a suffix string to the real path to be used for retrieving
consumer_mode String earliest false earliest (consume all files) or last_committed (consume files not consumed before)
checkpoint_settings Map (K-V) None false Is required if consume_mode is last_committed

SFTP_SETTINGS

Name Type Default Value Mandatory Description
sftp.ssh.auth.user String None true ssh user name for sftp server
sftp.ssh.auth.pass String None true ssh user password for sftp server
sftp.ssh.port String 22 false port of your sftp server
sftp.ssh.host String None true host name of your sftp server
sftp.ssh.file.name_regex String * false valid unix regex expression for searching files
sftp.ssh.timeout String 0 false timeout for sftp channel and session due to inactivity
sftp.ssh.scan_directories List of Str None false only top level files of a given folder are scanned. in case you wish to scan files within a folder, use this parameter where each element of the list is the name of a folder you want to scan
sftp.session Map k-v None false valid sftp configuration for a session (unix-like) - see /etc/ssh/ssh_config
sftp.channel Map k-v None false valid sftp configuration for a channel (unix-like)

CHECKPOINT_SETTINGS

Name Type Default Value Mandatory Description
checkpoint.application_runtime_id String None True A unique ID to the running pipeline
checkpoint.es_index_prefix String None True Elasticsearch index name where checkpoint will be stored: should end with '-', a date suffix will be added to the name: year.month.day
checkpoint.es_nodes List(Map K-V) None True Each map should have 2 fields: host -> String and port -> int

Note: Some settings for sftp.channel and sftp.session requires the sftp server to have the right configuration for it to work.