Skip to content

Archives Housekeeping

Abstract

This chapter explains how to setup a archives data lifecycle.

Overview

If you use archives to store data on long term object storage, you will need to define a strategy to clean old data.

The punch provides a ready-to-use archives-housekeeping application. It can be included in one of your channel to clean old data periodically.

Configuration

Here is a configuration example that cleans data older than 10 minutes on File System and data older than 3 days on Minio :

{
   "archiving_pools": [
    {
      "devices_addresses": ["file:///tmp/storage"],
      "pool": "mytenant-data",
      "retention": "10m",
      "max_deletion_percentage": 100,
      "es_cluster_id": "es_search",
      "es_index": "mytenant-archive"
    },
    {
      "devices_addresses": ["http://localhost:9000"],
      "pool": "mytenant-data",
      "retention": "3d",
      "max_deletion_percentage": 100,
      "es_cluster_id": "es_search",
      "es_index": "mytenant-archive",
      "access_key": "miniouser",
      "secret_key": "miniopassword"
    }
  ]
}
Run the application on an Operator :
archives-housekeeping /path/to/your/archives-housekeeping.json

Include the application in a Shiva channel :

{
    "version" : "6.0",
    "jobs" : [
        {
            "type" : "shiva",
            "name" : "archives-housekeeping",
            "command" : "archives-housekeeping",
            "args": [
                "archives-housekeeping.json"
            ],
            "resources": [
                "archives-housekeeping.json"
            ],
            "cluster" : "common",
            "shiva_runner_tags" : ["standalone"],
            "quartzcron_schedule" : "0 0 * ? * * *"
        }
    ]
}

Parameters

Mandatory Parameters

  • devices_addresses (array of string)

    Array of devices addresses.
    For Ceph, the address is the absolute path to the ceph cluster configuration file.
    with format ceph_configuration://<path>.
    For File-System, the address is the absolute path of the archive root directory with format file:///<path>.
    For Minio, the address is the URL to the Minio cluster with format http://<url>.

  • pool (string)

    Archiving pool name. It is the tenant name by default.

  • retention (string)

    Retention time. All data with batch.latest_ts older than specified retention will be deleted. Available formats:
    - Seconds : 30s, 30secs, 30seconds.
    - Minutes : 1m, 10mins, 2minutes.
    - Hours : 1h, 1hrs, 1hours.
    - Days : 1d, 1day, 1days.

  • max_deletion_percentage (decimal)

    Maximum deletion percentage compared with the whole cluster data. This feature exists to prevent accidental deletion: operation will fail if we ask to delete more that x% of data.
    default: 1.0

  • es_cluster_id (string)

    Elasticsearch cluster used to store objects meta-data.

  • es_index (string)

    Elasticsearch index containing objects meta-data (will be appended -*)

Security Parameters

{
   "archiving_pools": [
    {
      "devices_addresses": ["file:///tmp/storage"],
      "pool": "mytenant-data",
      "retention": "3m",
      "max_deletion_percentage": 100,
      "es_cluster_id": "es_search",
      "es_index": "mytenant-archive",
      "credentials": {
        "user": "bob",
        "password": "pass"
      },
      "ssl": false 
    }
  ]
}
  • credentials.user (string)

    Username to authenticate to ES cluster. Needs credentials.password configuration.

  • credentials.password (string)

    Password to authenticate to ES cluster. Needs credentials.user configuration.

  • credentials.token (string)

    Token string to authenticate to ES cluster. Needs credentials.token_type configuration.

  • credentials.token_type (string)

    Token type used to authenticate to ES cluster. Needs credentials.token configuration.

  • ssl (Boolean)

    If true, encrypt the connection to the ES cluster with TLS

  • ssl_private_key (String)

    Path to the client's private key for TLS connection

  • ssl_certificate (String)

    Path to the client's public key for TLS connection

  • ssl_trusted_certificate (String)

    Path to the client's CA file for TLS connection

Optional Parameters

  • user (string)

    Only for Ceph. Ceph user name.

  • access_key (string)

    Only for Minio. Minio access key.

  • secret_key (string)

    Only for Minio. Minio secret key.

Warnings

Archive permission

In case of filesystem, do not forget to give read+write permissions to the archive folder. The Shiva process owner must have the read write delete permissions on the archive data.

Elasticsearch archives mapping

This application works with the Elasticsearch mapping template mapping_archive.json and especially the topic_name field.

If you need to load this mapping, use this command: curl -X POST localhost:9200/_template/mapping_archive \ -H "Content-Type: application/json" \ -d @$PUNCHPLATFORM_CONF_DIR/resources/elasticsearch/mapping_archive.json