Skip to content

Archives Housekeeping

Abstract

This chapter explains how to setup your archive(s) data lifecycle.

Overview

If you use the punch archives to store data on long term object storage, you will also need to define your strategy to clean too old data. Just like for the elasticsearch data housekeeping, the punch provides you with a ready to use archive housekeeper job, that you can include in one of your channel.

Configuration

First add the following job to the channel channel_structure.json descriptor file:

{
    "version" : "5.0",
    "jobs" : [
        {
            "type" : "shiva",
            "name" : "archives-housekeeping",
            "command" : "archives-housekeeping",
            "args": [
                "archives-housekeeping.json"
            ],
            "resources": [
                "archives-housekeeping.json"
            ],
            "cluster" : "common",
            "shiva_runner_tags" : ["standalone"],
            "quartzcron_schedule" : "0 0 * ? * * *"
        }
    ]
}

Along with an archives-housekeeping.json file to define your settings. Here is an example:

{
   "archiving_pools": [
    {
      "destinations": ["file:///tmp/storage"],
      "pool": "mytenant-data",
      "retention": "3m",
      "max_deletion_percentage": 100,
      "es_cluster_id": "es_search",
      "es_index": "mytenant-archive",
      "user": "admin"
    }
  ]
}

Watch out! This service works with the Elasticsearch mapping template mapping_archive.json and especially the topic_name field. If you need to load this mapping, use this command:

$ curl -X POST localhost:9200/_template/mapping_archive \
       -H "Content-Type: application/json" \
       -d @resources/elasticsearch/mapping_archive.json

Parameters

Mandatory Parameters

  • destinations (string)

    Array of destinations. In case of a ceph cluster, this name is the absolute path to the ceph cluster configuration file with format ceph_configuration://.
    In case of a file-system storage, this name is the absolute path of the archive root directory with format file:///.

  • pool (string)

    Archiving pool name. Usually is the tenant name.

  • retention (string)

    Retention time. All data older than specified retention will be deleted.
    format: 30s, 1m, 2minutes, 1h, 1d...

  • max_deletion_percentage (decimal)

    Maximum deletion percentage compared with the whole cluster data. This feature exists to prevent accidental deletion: operation will fail if we ask to delete more that x% of data.
    default: 1.0

  • es_cluster_id (string)

    Elasticsearch cluster used to store objects meta-data.

  • es_index (string)

    Elasticsearch index containing objects meta-data (will be appended -*)

Security Parameters

{
   "archiving_pools": [
    {
      "destinations": ["file:///tmp/storage"],
      "pool": "mytenant-data",
      "retention": "3m",
      "max_deletion_percentage": 100,
      "es_cluster_id": "es_search",
      "es_index": "mytenant-archive",
      "credentials": {
        "user": "bob",
        "password": "pass"
      },
      "ssl": false 
    }
  ]
}
  • credentials.user (string)

    Username to authenticate to ES cluster. Needs credentials.password configuration.

  • credentials.password (string)

    Password to authenticate to ES cluster. Needs credentials.user configuration.

  • credentials.token (string)

    Token string to authenticate to ES cluster. Needs credentials.token_type configuration.

  • credentials.token_type (string)

    Token type used to authenticate to ES cluster. Needs credentials.token configuration.

  • ssl (Boolean)

    If true, encrypt the connection to the ES cluster with TLS

  • ssl_private_key (String)

    Path to the client's private key for TLS connection

  • ssl_certificate (String)

    Path to the client's public key for TLS connection

  • ssl_trusted_certificate (String)

    Path to the client's CA file for TLS connection

Optional Parameters

  • user (string)

    Only in case of a ceph cluster, name used to access to Ceph cluster.

Archive permission

In case of filesystem, do not forget to give read+write permissions to the archive folder. The shiva process owner must have the read write delete permissions on the archive data.