Skip to content

Archives Housekeeping

Abstract

This chapter explains how to setup your archive(s) data lifecycle.

Overview

If you use the punch archives to store data on long term object storage, you will also need to define your strategy to clean too old data. Just like for the elasticsearch data housekeeping, the punch provides you with a ready to use archive housekeeper job, that you can include in one of your channel.

Configuration

First add the following job to the channel channel_structure.json descriptor file:

{
    "version" : "5.0",
    "jobs" : [
        {
            "type" : "shiva",
            "name" : "archives-housekeeping",
            "command" : "archives-housekeeping",
            "args": [
                "archives-housekeeping.json"
            ],
            "resources": [
                "archives-housekeeping.json"
            ],
            "cluster" : "common",
            "shiva_runner_tags" : ["standalone"],
            "quartzcron_schedule" : "0 0 * ? * * *"
        }
    ]
}

Along with an archives-housekeeping.json file to define your settings. Here is an example:

{
   "archiving_pools": [
    {
      "destinations": ["file:///tmp/indexed_filestore"],
      "pool": "mytenant-data",
      "retention": "3m",
      "max_deletion_percentage": 100,
      "es_cluster_id": "es_search",
      "es_index": "mytenant-archive",
      "user": "admin"
    }
  ]
}

Watch out! This service works with the Elasticsearch mapping template mapping_archive.json and especially the topic_name field. If you need to load this mapping, use this command:

$ curl -X POST localhost:9200/_template/mapping_archive \
       -H "Content-Type: application/json" \
       -d @resources/elasticsearch/mapping_archive.json

Parameters

Mandatory Parameters

  • destinations (string)

    Array of destinations. In case of a ceph cluster, this name is the real ceph cluster name with format ceph:.
    In case of a file-system storage, this name is the absolute path of the archive root directory with format file:///.

  • pool (string)

    Archiving pool name. Usually is the tenant name.

  • retention (string)

    Retention time. All data older than specified retention will be deleted.
    format: 30s, 1m, 2minutes, 1h, 1d...

  • max_deletion_percentage (decimal)

    Maximum deletion percentage compared with the whole cluster data. This feature exists to prevent accidental deletion: operation will fail if we ask to delete more that x% of data.
    default: 1.0

  • es_cluster_id (string)

    Elasticsearch cluster used to store objects meta-data.

  • es_index (string)

    Elasticsearch index containing objects meta-data (will be appended -*)

Optional Parameters

  • user (string)

    Only in case of a ceph cluster, name used to access to Ceph cluster.

Archive permission

In case of filesystem, do not forget to give read+write permissions to the archive folder. The shiva process owner must have the read write delete permissions on the archive data.