File Transfer Output¶
Introduction¶
The file transfer output in contrast to the file output node enable you to transfer files from one end to another.
Supported destination output are:
- S3
- HDFS
- FILESYSTEM
More will be added later...
For each file that were transferred, meta information of these files are published. These can be stored in a kafka topic for later data processing !
Note: it is expected that the received stream this not is subscribed to contains the field:
- remote_file_last_modified_timestamp (long in seconds)
Example(s)¶
Example below illustrate how to fetch files from an SFTP server locally and send them to an S3 cluster with all sent files meta data stored in a kafka topic.
{
version: "6.0"
type: punchline
runtime: storm
dag: [
{
type: sftp_input
component: input
settings: {
cleanup: false
download_path: /tmp/toto
download_ignore_suffix: ".complete"
download_add_suffix: ".nm4"
consume_mode: earliest
sftp_settings: {
sftp.ssh.host: server.com
sftp.ssh.auth.user: user
sftp.ssh.auth.pass: pass
sftp.ssh.file.name_regex: "*.complete"
sftp.ssh.scan_directories: [
SAISData
]
}
checkpoint_settings: {
checkpoint.application_runtime_id: sftp_application_id_test
checkpoint.es_index_prefix: mytenant-test-
checkpoint.es_nodes: [
{
host: localhost
}
]
}
}
publish: [
{
stream: files
fields: [
meta
]
}
]
subscribe: [
]
}
{
type: punchlet_node
component: punch
settings: {
punchlet_code: "{ print(root); }"
}
subscribe: [
{
stream: files
component: input
}
]
publish: [
{
stream: files
fields: [
meta
]
}
]
}
{
type: file_transfer_output
component: output
settings: {
destination_folder: s3a://punch/transferred
received_file_path: local_downloaded_file_path
hadoop_settings: {
fs.s3a.access.key: minioadmin
fs.s3a.secret.key: minioadmin
fs.s3a.endpoint: http://127.0.0.1:9000
}
}
subscribe: [
{
stream: files
component: punch
}
]
publish: [
{
stream: files
fields: [
meta
]
}
]
}
{
type: kafka_output
component: kafka_out
settings: {
topic: hello_world
brokers: local
encoding: json
producer.acks: all
producer.batch.size: 16384
producer.linger.ms: 5
}
subscribe: [
{
stream: files
component: output
}
]
}
]
}
Settings¶
Main Settings¶
Name | Type | Default Value | Mandatory | Description |
---|---|---|---|---|
destination_folder | String | None | True | Absolute path where data will be stored. Prefix used will determine the database abstraction layer. I.e s3a:// refers to S3 |
received_file_path | String | None | True | Stream this node is subscribed to must contain a field which match the value of this parameter. The matched field value should be an absolute path to a file locally |
hadoop_settings | Map K-V | None | False | Not mandatory, but is required when you will want to set username and password |