Archives Housekeeping¶
Overview¶
If you use archives to store data on long term object storage, you will need to define a strategy to clean old data.
The Punch provides a ready-to-use archives-housekeeping
application.
It can be included in one of your channel to clean old data periodically.
Prerequisites¶
Elastic mapping¶
The archives-housekeeping
application reads archives metadata from Elasticsearch.
When archiving, metadata should be indexed on an index configured with mapping_archive.json
.
Here is the command to load this mapping on the Standalone:
curl -X POST localhost:9200/_template/mapping_archive \
-H "Content-Type: application/json" \
-d @$PUNCHPLATFORM_CONF_DIR/resources/elasticsearch/mapping_archive.json
Permissions¶
In case of filesystem, do not forget to give read+write
permissions to the archive folder.
When running this application in Shiva, the Punch daemon user must have those permissions on the archive data.
Configuration¶
Here is a configuration example from standalone that cleans
- data older than 1 hour on File System
- data older than 3 days on Minio
archiving_pools: # File system - devices_addresses: - file:///tmp/archive-logs/storage pool: mytenant retention: 1h max_deletion_percentage: 100 es_cluster_id: common es_index: mytenant-archive es_bulk_size: 1000 es_timeout: 10s delete_metadata: all_devices # Minio - devices_addresses: - http://localhost:9200 pool: mytenant retention: 3d max_deletion_percentage: 100 es_cluster_id: common es_index: mytenant-archive es_bulk_size: 1000 es_timeout: 10s delete_metadata: all_devices access_key: minioadmin secret_key: minioadmin
Parameters¶
Mandatory Parameters¶
-
devices_addresses
(array of string)Array of devices addresses.
For Ceph, the address is the absolute path to the ceph cluster configuration file.
with formatceph_configuration://<path>
.
For File-System, the address is the absolute path of the archive root directory with formatfile:///<path>
.
For Minio, the address is the URL to the Minio cluster with formathttp://<url>
. -
pool
(string)Archiving pool name. It is the tenant name by default.
-
retention
(string)Retention time. All data with
batch.latest_ts
older than specified retention will be deleted. Available formats:
- Seconds :30s
,30secs
,30seconds
.
- Minutes :1m
,10mins
,2minutes
.
- Hours :1h
,1hrs
,1hours
.
- Days :1d
,1day
,1days
. -
max_deletion_percentage
(decimal)Maximum deletion percentage compared with the whole cluster data. This feature exists to prevent accidental deletion: operation will fail if we ask to delete more that x% of data.
Set to100.0
to disable this protection. -
es_cluster_id
(string)Elasticsearch cluster used to store objects meta-data.
-
es_index
(string)Elasticsearch index containing objects meta-data (will be appended
-*
)
Security Parameters¶
-
credentials.user
(string)Username to authenticate to ES cluster. Needs
credentials.password
configuration. -
credentials.password
(string)Password to authenticate to ES cluster. Needs
credentials.user
configuration. -
credentials.token
(string)Token string to authenticate to ES cluster. Needs
credentials.token_type
configuration. -
credentials.token_type
(string)Token type used to authenticate to ES cluster. Needs
credentials.token
configuration. -
ssl
(Boolean)If true, encrypt the connection to the ES cluster with TLS
-
ssl_private_key
(String)Path to the client's private key for TLS connection
-
ssl_private_key_password
(String)Password for client's private key for TLS connection
-
ssl_private_key_alias
(String)Alias for client's private key for TLS connection
-
ssl_certificate
(String)Path to the client's certificate for TLS connection
-
ssl_trusted_certificate
(String)Path to the client's CA file for TLS connection
-
ssl_keystore_location
(String)Path to the client's keystore for TLS connection
-
ssl_keystore_password
(String)Password for client's keystore for TLS connection
-
ssl_truststore_location
(String)Path to the client's truststore for TLS connection
-
ssl_truststore_password
(String)Password for client's truststore for TLS connection
-
user
(string)Only for Ceph. Ceph user name.
-
access_key
(string)Only for Minio. Minio access key.
-
secret_key
(string)Only for Minio. Minio secret key.
Optional Parameters¶
-
delete_metadata
(string)Deletion strategy regarding metadata. Default: "always". Possible values:
- "always": metadata is always deleted after processing devices.
- "never": metadata is never deleted after processing devices.
- "all_devices": metadata is deleted only when all devices are cleaned by housekeeping. If some devices were not cleaned (because they were not specified or because the application failed to clean the device) then the application updates the metadata to keep the device where the archive is still present. This is the recommended behavior for production.
-
es_bulk_size
(int)Size of elastic bulk request containing delete and update metadata actions. Default: 1000
-
es_timeout
(String)Elastic request timeout. Default: 10s
Running in foreground¶
Run the application in foreground on a Punch Operator :
archives-housekeeping /path/to/your/archives-housekeeping.yaml
Running in Shiva¶
Include the application in a Shiva channel. Example from standalone :
version: '6.0'
start_by_tenant: true
stop_by_tenant: true
applications:
- name: elasticsearch-housekeeping
runtime: shiva
cluster: common
command: elasticsearch-housekeeping
args:
- --tenant-configuration-path
- elasticsearch-housekeeping.json
apply_resolver_on:
- elasticsearch-housekeeping.json
quartzcron_schedule: 0 * * ? * * *
- name: archives-housekeeping
runtime: shiva
cluster: common
command: archives-housekeeping
args:
- archives-housekeeping.yaml
- --childopts
- -Xms100m -Xmx500m
quartzcron_schedule: 0 * * ? * * *
resources: [ ]