Skip to content

Archive Reader

Overview

Compatible Spark/Pyspark

The archive_reader node extracts archived data from S3, Ceph or filesystem, using metadata stored in Elasticsearch.

This node must follow an Elastic Batch Input node providing metadata about the archived data.

{
    type: punchline
    version: "6.0"
    runtime: spark
    tenant: default
    dag: 
    [
        {
            type: elastic_batch_input
            component: input
            settings: {
                index: mytenant-archive-*
                cluster_name : es_search
                nodes: [
                    localhost
                ]
                query: {
                    bool: {
                        must: [
                            {
                                term: {
                                    pool: mytenant
                                }
                            }
                        ]
                    }
                }
            }
            publish: [
                {
                    stream: metadata 
                }
            ]
        }
    ]
}

Filters

The Archive Reader will read a batch for every corresponding Metadata received as input. Therefore, filters and selections about dates, pool, tags etc... must be applied in the Elasticsearch query.

{
    type: elastic_batch_input
    component: input
    settings: {
        index: mytenant-archive-*
        cluster_name : es_search
        nodes: [
            localhost
        ]
    }
    publish: [
        {
            stream: metadata 
        }
    ]
}

Configuration(s)

  • device_address: String

    Description: [Optional] Address to the device where data is stored. Start with file://, ceph_configuration://, http://...

  • user: String

    Description: [Optional] The user used to access Ceph cluster.

  • access-key: String

    Description: [Optional] Access key for MinIO cluster.

  • secret-key: String

    Description: [Optional] Secret key for MinIO cluster.