Skip to content

HOWTO configure kafka retention

Why do that

During the deployment of a kafka cluster in a punchplatform, there is several steps.

  • The first step is to provide sufficient storage.
  • The second step is to configure the main kafka retention before exactly know the number of each topics (related to technologies collected) and their volume in EPS.
  • The third step is just before go production. The integrator MUST adapts the kafka retention to size the retention for each topics.

Set the main retention settings

These settings are set during the deployment of the PunchPlatform stack through punchplatform-cluster tool. They can be updated in the punchplatform.properties.

1
2
3
4
5
6
7
8
"kafka" : {
    "clusters" : {
        "local" : {
            "partition_retention_bytes" : 1073741824,
            "partition_retention_hours" : 24 
        }
    }
}

Then, update the platform through punchplatform-cluster tool.

1
2
$ # USE --check --diff before the permanent deployment ! ! !
$ punchplatform-deployer.sh deploy -Kk -t kafka --diff

Set the specific retention settings for each topic

First, check the number of topics and the partition number.

1
2
$ # from the PunchPlatform operator environment
$ punchplatform-kafka-topic.sh --describe

Second, fill the following Excel file.

Sizing File

Then, update the configuration in production for each topic with settings show in the red square.

1
2
$ # from the PunchPlatform operator environment
$ punchplatform-kafka-topics.sh --kafkaCluster <cluster> --topic <topic_name> --alter --config retention.bytes=NNNNNNNNNNN

!!! warning "The retention.byte of the topic must be lower than the main settings. So the main settings must be the max of the retention byte of each topic."