SFTP Input¶
Introduction¶
This node can be use to fetch files through ssh protocol. You can download files to a specific path and persist them or download them temporarily to a temporary directory.
Most of this node parameters are similar to what you are used on an sshd server by using the key:
sftp.session
orsftp.channel
Example(s)¶
{
version: "6.0"
type: punchline
runtime: storm
dag: [
{
type: sftp_input
component: input
settings: {
cleanup: false
download_path: /tmp/toto
sftp_settings: {
sftp.ssh.host: mysftphost.com
sftp.ssh.auth.user: auser
sftp.ssh.auth.pass: apass
sftp.ssh.file.name_regex: "*.txt"
sftp.ssh.scan_directories: [
dir1,
dir2
]
}
}
publish: [
{
stream: files
fields: [
meta
]
}
]
subscribe: [
]
}
{
type: punchlet_node
component: stdout
settings: {
punchlet_code: "{ print(root); }"
}
subscribe: [
{
stream: files
component: input
}
]
publish: [
{
stream: files
fields: [
meta
]
}
]
}
]
}
Settings¶
Main Settings¶
Name | Type | Default Value | Mandatory | Description |
---|---|---|---|---|
sftp_settings | Map (k-v) | None | true | A list of key(string)-value(string) parameters. See SFTP_SETTINGS section below for more information |
cleanup | boolean | true | false | Wipe all downloaded path after punchline finished or failed |
download_path | String | System tmp dir | false | Specify an absolute path in which this node will use as storage |
download_ignore_suffix | String | None | false | When files will be retrieved, you can ignore a suffix string of the real path of the file, normally used in together with download_add_suffix |
download_add_suffix | String | None | false | When files will be retrieved and download_ignore_suffix is set, you can append a suffix string to the real path to be used for retrieving |
consumer_mode | String | earliest | false | earliest (consume all files) or last_committed (consume files not consumed before) |
checkpoint_settings | Map (K-V) | None | false | Is required if consume_mode is last_committed |
SFTP_SETTINGS¶
Name | Type | Default Value | Mandatory | Description |
---|---|---|---|---|
sftp.ssh.auth.user | String | None | true | ssh user name for sftp server |
sftp.ssh.auth.pass | String | None | true | ssh user password for sftp server |
sftp.ssh.port | String | 22 | false | port of your sftp server |
sftp.ssh.host | String | None | true | host name of your sftp server |
sftp.ssh.file.name_regex | String | * | false | valid unix regex expression for searching files |
sftp.ssh.timeout | String | 0 | false | timeout for sftp channel and session due to inactivity |
sftp.ssh.scan_directories | List of Str | None | false | only top level files of a given folder are scanned. in case you wish to scan files within a folder, use this parameter where each element of the list is the name of a folder you want to scan |
sftp.session | Map k-v | None | false | valid sftp configuration for a session (unix-like) - see /etc/ssh/ssh_config |
sftp.channel | Map k-v | None | false | valid sftp configuration for a channel (unix-like) |
CHECKPOINT_SETTINGS¶
Name | Type | Default Value | Mandatory | Description |
---|---|---|---|---|
checkpoint.application_runtime_id | String | None | True | A unique ID to the running pipeline |
checkpoint.es_index_prefix | String | None | True | Elasticsearch index name where checkpoint will be stored: should end with '-', a date suffix will be added to the name: year.month.day |
checkpoint.es_nodes | List(Map K-V) | None | True | Each map should have 2 fields: host -> String and port -> int |
Note: Some settings for sftp.channel and sftp.session requires the sftp server to have the right configuration for it to work.