Skip to content

Archives Housekeeping

Abstract

This chapter explains how to setup your archive(s) data lifecycle.

Overview

If you use the punch archives to store data on long term object storage, you will also need to define your strategy to clean too old data. Just like for the elasticsearch data housekeeping, the punch provides you with a ready to use archive housekeeper job, that you can include in one of your channel.

Configuration

First add the following job to the channel channel_structure.json descriptor file:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
{
    "version" : "5.0",
    "jobs" : [
        {
            "type" : "shiva",
            "name" : "archives-housekeeping",
            "command" : "archives-housekeeping",
            "args": [
                "archives-housekeeping.json"
            ],
            "resources": [
                "archives-housekeeping.json"
            ],
            "cluster" : "common",
            "shiva_runner_tags" : ["standalone"],
            "quartzcron_schedule" : "0 0 * ? * * *"
        }
    ]
}

Along with an archives-housekeeping.json file to define your settings. Here is an example:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
{
   "archiving_pools": [
    {
      "cluster_name": "/tmp/indexed_filestore",
      "pool_name": "mytenant-data",
      "techno": "indexed_filestore",
      "retention": "3m",
      "max_deletion_percentage": 100,
      "es_cluster_id": "es_search",
      "es_index_prefix": "",
      "es_index_suffix": "",
      "user": "admin"
    }
  ]
}

Watch out! This service works with the Elasticsearch mapping template object-storage-template.json and especially the topic_name field. If you need to load this mapping, use this command:

1
2
3
$ curl -X POST localhost:9200/_template/objects-storage \
       -H "Content-Type: application/json" \
       -d @resources/elasticsearch/objects-storage-template.json

Parameters

Mandatory Parameters

  • cluster_name (string)

    In case of a ceph cluster, this name is the real ceph cluster name.
    In case of a file-system storage, this name is the absolute path of the archive root directory.

  • pool_name (string)

    Archiving pool name. Usually is the <tenant>-data name.

  • techno (string)

    Archiving technology.
    ceph for a real Ceph archiving system
    indexed_filestore for a file-system archiving system.

  • retention (string)

    Retention time. All data older than specified retention will be deleted.
    format: 30s, 1m, 2minutes, 1h, 1d...

  • max_deletion_percentage (decimal)

    Maximum deletion percentage compared with the whole cluster data. This feature exists to prevent accidental deletion: operation will fail if we ask to delete more that x% of data.
    default: 1.0

  • es_cluster_id (string)

    Elasticsearch cluster used to store objects meta-data.

Optional Parameters

  • es_index_prefix (string)

    Optional prefix of name of Elasticsearch index containing objects meta-data

  • es_index_suffix (string)

    Optional suffix of name of Elasticsearch index containing objects meta-data

  • es_timeout (string)

    Time-out of Elasticsearch requests when requesting objects meta-data.
    default: "60"

  • user (string)

    Only in case of a ceph cluster, name used to access to Ceph cluster.

Archive permission

In case of indexed_filestore do not forget to give read+write permissions to the archive folder. The shiva process owner must have the read write delete permissions on the archive data.