If you use archives to store data on long term object storage, you will need to define a strategy to clean old data.
The Punch provides a ready-to-use
It can be included in one of your channel to clean old data periodically.
archives-housekeeping application reads archives metadata from Elasticsearch.
When archiving, metadata should be indexed on an index configured with
Here is the command to load this mapping on the Standalone:
curl -X POST localhost:9200/_template/mapping_archive \ -H "Content-Type: application/json" \ -d @$PUNCHPLATFORM_CONF_DIR/resources/elasticsearch/mapping_archive.json
In case of filesystem, do not forget to give
read+write permissions to the archive folder.
When running this application in Shiva, the Punch daemon user must have those permissions on the archive data.
Here is a configuration example from standalone that cleans
- data older than 1 hour on File System
- data older than 3 days on Minio
archiving_pools: # File system - devices_addresses: - file:///tmp/archive-logs/storage pool: mytenant retention: 1h max_deletion_percentage: 100 es_cluster_id: common es_index: mytenant-archive es_bulk_size: 1000 es_timeout: 10s delete_metadata: all_devices # Minio - devices_addresses: - http://localhost:9200 pool: mytenant retention: 3d max_deletion_percentage: 100 es_cluster_id: common es_index: mytenant-archive es_bulk_size: 1000 es_timeout: 10s delete_metadata: all_devices access_key: minioadmin secret_key: minioadmin
devices_addresses(array of string)
Array of devices addresses.
For Ceph, the address is the absolute path to the ceph cluster configuration file.
For File-System, the address is the absolute path of the archive root directory with format
For Minio, the address is the URL to the Minio cluster with format
Archiving pool name. It is the tenant name by default.
Retention time. All data with
batch.latest_tsolder than specified retention will be deleted. Available formats:
- Seconds :
- Minutes :
- Hours :
- Days :
Maximum deletion percentage compared with the whole cluster data. This feature exists to prevent accidental deletion: operation will fail if we ask to delete more that x% of data.
100.0to disable this protection.
Elasticsearch cluster used to store objects meta-data.
Elasticsearch index containing objects meta-data (will be appended
Username to authenticate to ES cluster. Needs
Password to authenticate to ES cluster. Needs
Token string to authenticate to ES cluster. Needs
Token type used to authenticate to ES cluster. Needs
If true, encrypt the connection to the ES cluster with TLS
Path to the client's private key for TLS connection
Password for client's private key for TLS connection
Alias for client's private key for TLS connection
Path to the client's certificate for TLS connection
Path to the client's CA file for TLS connection
Path to the client's keystore for TLS connection
Password for client's keystore for TLS connection
Path to the client's truststore for TLS connection
Password for client's truststore for TLS connection
Only for Ceph. Ceph user name.
Only for Minio. Minio access key.
Only for Minio. Minio secret key.
Deletion strategy regarding metadata. Default: "always". Possible values:
- "always": metadata is always deleted after processing devices.
- "never": metadata is never deleted after processing devices.
- "all_devices": metadata is deleted only when all devices are cleaned by housekeeping. If some devices were not cleaned (because they were not specified or because the application failed to clean the device) then the application updates the metadata to keep the device where the archive is still present. This is the recommended behavior for production.
Size of elastic bulk request containing delete and update metadata actions. Default: 1000
Elastic request timeout. Default: 10s
Running in foreground¶
Run the application in foreground on a Punch Operator :
Running in Shiva¶
Include the application in a Shiva channel. Example from standalone :
version: '6.0' start_by_tenant: true stop_by_tenant: true applications: - name: elasticsearch-housekeeping runtime: shiva cluster: common command: elasticsearch-housekeeping args: - --tenant-configuration-path - elasticsearch-housekeeping.json apply_resolver_on: - elasticsearch-housekeeping.json quartzcron_schedule: 0 * * ? * * * - name: archives-housekeeping runtime: shiva cluster: common command: archives-housekeeping args: - archives-housekeeping.yaml - --childopts - -Xms100m -Xmx500m quartzcron_schedule: 0 * * ? * * * resources: [ ]