Skip to content

Punch templates

Abstract

To help you build a uniform set of punchlines, the punch leverage a jinja2 renderer. It allows you to generate punchlines or other configuration items. The purpose of this feature is to :

  • ensure punchlines homogeneity
  • provide a custom abstraction of the punchline
  • ease the migration process by applying changes to templates only
  • avoid bad configurations by using a validated and tested template

Punch templating is based on the jinjava implementation of jinja2.

Quick tour

With the following file structure :

.
└── tenants
    └── mytenant
        └── etc
            │── channel_config
            │   └── apache_httpd.yaml
            └── templates
                └── shiva_single_archiving
                    ├── archiving.yaml.j2
                    ├── channel_structure.yaml.j2
                    └── input.yaml.j2

apply the following command :

channelctl -t mytenant configure tenants/mytenant/etc/channel_config/apache_httpd.yaml

and you will generate the following files :

.
└── tenants
    └── mytenant
        ├── channels
        │   └── apache_httpd
        │       ├── archiving.yaml
        │       ├── channel_structure.yaml
        │       └── input.yaml
        └── etc
            │── channel_config
            │   └── apache_httpd.yaml
            └── templates
                └── shiva_single_archiving
                    ├── archiving.yaml.j2
                    ├── channel_structure.yaml.j2
                    └── input.yaml.j2

Configuration

Let us illustrate how to make templates for your channel composed of 2 pipelines, input and archiving. You will need to create the values file (tenants/mytenant/etc/channel_config/apache_httpd.yaml) and the jinja templates (_tenants/mytenant/etc/templates/shiva_single_archiving/.j2_). Do not forget to create a documentation* of your template (like a Readme).

1 Bring your working punchlines files

The following punchline and channel_structure files are the result you want to generate.

Channel Structure file

version: '6.0'
start_by_tenant: true
stop_by_tenant: true

resources:
  - type: kafka_topic
    name: mytenant_apache_httpd_archiving
    cluster: common
    partitions: 1
    replication_factor: 1

applications:
  - name: input
    runtime: shiva
    command: punchlinectl
    args:
      - start
      - --punchline
      - input.yaml
    shiva_runner_tags:
      - common
    cluster: common
    reload_action: kill_then_start
  - name: archiving
    runtime: shiva
    command: punchlinectl
    args:
      - start
      - --punchline
      - archiving.yaml
    shiva_runner_tags:
      - common
    cluster: common
    reload_action: kill_then_start

Input punchline

version: '6.0'
runtime: storm
channel: apache_httpd
type: punchline
meta:
  vendor: apache
  technology: apache_httpd
dag:

  # Syslog
  - type: syslog_input
    settings:
      listen:
        proto: tcp
        host: 0.0.0.0
        port: 9901
      self_monitoring.activation: true
      self_monitoring.period: 10
    publish:
      - stream: logs
        fields:
          - log
          - _ppf_local_host
          - _ppf_local_port
          - _ppf_remote_host
          - _ppf_remote_port
          - _ppf_timestamp
          - _ppf_id
      - stream: _ppf_metrics
        fields:
          - _ppf_latency

  # Punchlet node
  - type: punchlet_node
    component: punchlet
    settings:
      punchlet_json_resources:
        - punch-webserver-parsers-1.0.0/com/thalesgroup/punchplatform/webserver/apache_httpd/resources/http_codes.json
        - punch-webserver-parsers-1.0.0/com/thalesgroup/punchplatform/webserver/apache_httpd/resources/taxonomy.json
      punchlet:
        - punch-common-punchlets-1.0.0/com/thalesgroup/punchplatform/common/input.punch
        - punch-common-punchlets-1.0.0/com/thalesgroup/punchplatform/common/parsing_syslog_header.punch
        - punch-webserver-parsers-1.0.0/com/thalesgroup/punchplatform/webserver/apache_httpd/parser_apache_httpd.punch
        - punch-webserver-parsers-1.0.0/com/thalesgroup/punchplatform/webserver/apache_httpd/enrichment.punch
        - punch-webserver-parsers-1.0.0/com/thalesgroup/punchplatform/webserver/apache_httpd/normalization.punch
    subscribe:
      - component: syslog_input
        stream: logs
      - component: syslog_input
        stream: _ppf_metrics
    publish:
      - stream: logs
        fields:
          - log
          - _ppf_id
          - _ppf_timestamp
      - stream: _ppf_errors
        fields:
          - _ppf_error_message
          - _ppf_error_document
          - _ppf_id
      - stream: _ppf_metrics
        fields:
          - _ppf_latency

  # ES Output
  - type: elasticsearch_output
    settings:
      per_stream_settings:
        - stream: logs
          index:
            type: daily
            prefix: mytenant-events-
          document_json_field: log
          document_id_field: _ppf_id
          additional_document_value_fields:
            - type: date
              document_field: '@timestamp'
              format: iso
        - stream: _ppf_errors
          document_json_field: _ppf_error_document
          additional_document_value_fields:
            - type: tuple_field
              document_field: ppf_error_message
              tuple_field: _ppf_error_message
            - type: date
              document_field: '@timestamp'
              format: iso
          index:
            type: daily
            prefix: mytenant-events-
    subscribe:
      - component: punchlet
        stream: logs
      - component: punchlet
        stream: _ppf_errors
      - component: punchlet
        stream: _ppf_metrics

  # Kafka Output
  - type: kafka_output
    settings:
      topic: mytenant_apache_httpd_archiving
      encoding: lumberjack
      producer.acks: all
      producer.batch.size: 16384
      producer.linger.ms: 5
    subscribe:
      - component: punchlet
        stream: logs
      - component: punchlet
        stream: _ppf_metrics
metrics:
  reporters:
    - type: kafka
settings:
  topology.component.resources.onheap.memory.mb: 200 # 200m * (4 nodes + 1 topo) = 1G

Archiving punchline

version: '6.0'
type: punchline
channel: apache_httpd
runtime: storm
dag:

  # Kafka Input
- type: kafka_input
  settings:
    topic: mytenant_apache_httpd_archiving
    start_offset_strategy: last_committed
    fail_action: exit
  publish:
  - stream: logs
    fields:
    - log
    - _ppf_id
    - _ppf_timestamp
    - _ppf_partition_id
    - _ppf_partition_offset
  - stream: _ppf_metrics
    fields:
    - _ppf_latency

  # File Output
- type: file_output
  settings:
    create_root: true
    destination: file:///tmp/archive-logs/storage # File system
    topic: apache_httpd
    file_prefix_pattern: '%{topic}/%{date}/puncharchive-%{tags}-%{offset}'
    batch_size: 1000
    batch_expiration_timeout: 10s
    fields:
    - _ppf_id
    - _ppf_timestamp
    - log
    encoding: csv
    compression_format: gzip
    separator: __|__
    timestamp_field: _ppf_timestamp
  subscribe:
  - component: kafka_input
    stream: logs
  - component: kafka_input
    stream: _ppf_metrics
  publish:
  - stream: metadatas
    fields:
    - metadata
  - stream: _ppf_metrics
    fields:
    - _ppf_latency

  # ES Output
- type: elasticsearch_output
  component: metadatas_indexer
  settings:
    per_stream_settings:
    - stream: metadatas
      index:
        type: daily
        prefix: mytenant-archive-
      document_json_field: metadata
      batch_size: 1
      reindex_failed_documents: true
      error_index:
        type: daily
        prefix: mytenant-archive-errors
  subscribe:
  - component: file_output
    stream: metadatas
  - component: file_output
    stream: _ppf_metrics

  # Metrics
metrics:
  reporters:
  - type: kafka
settings:
  topology.component.resources.onheap.memory.mb: 56 # 56m * (3 nodes + 1 topo) = 224m

2 Identify the variables parts

You might want to reuse these punchlines so you can apply it for several tenant and several type of logs. In that case you will need to create template to easily generate the punchlines. The part of your punchlines you want to make templatize are :

  • elasticsearch index names
  • kafka topic name
  • listening input port
  • punchlets and json_resources

3 Create your value file

You can create your value file in the location you want. let us create it at the following path :

tenants/mytenant/etc/channel_config/apache_httpd.yaml

The following values are required :

  • channel string

    MANDATORY

    The name of the channel you want to create

  • tenant string

    MANDATORY

    The name of the tenant

  • channel_structure_profile string

    MANDATORY

    The name of the template directory. This is not a full path but only the name of the directory itself. The configure command will look for a directory in the predefined template directory locations.

You can then set customs values to use in your template.

Here is the value file corresponding to the example above :

channel_structure_profile: shiva_single_archiving
vendor: apache
tenant: mytenant
channel: apache_httpd
input:
  host: 0.0.0.0
  port: 9901
json_resources:
  - punch-webserver-parsers-1.0.0/com/thalesgroup/punchplatform/webserver/apache_httpd/resources/http_codes.json
  - punch-webserver-parsers-1.0.0/com/thalesgroup/punchplatform/webserver/apache_httpd/resources/taxonomy.json
punchlets:
  - punch-common-punchlets-1.0.0/com/thalesgroup/punchplatform/common/input.punch
  - punch-common-punchlets-1.0.0/com/thalesgroup/punchplatform/common/parsing_syslog_header.punch
  - punch-webserver-parsers-1.0.0/com/thalesgroup/punchplatform/webserver/apache_httpd/parser_apache_httpd.punch
  - punch-webserver-parsers-1.0.0/com/thalesgroup/punchplatform/webserver/apache_httpd/enrichment.punch
  - punch-webserver-parsers-1.0.0/com/thalesgroup/punchplatform/webserver/apache_httpd/normalization.punch

4 Create the template

Template directory locations

The configure command looks for a template directory with the name defined by the channel_structure_profile value. It looks at the following places, in this order :

  • $PUNCHPLATFORM_CONF_DIR/tenants/mytenant/etc/templates/
  • $PUNCHPLATFORM_CONF_DIR/tenants/mytenant/templates/
  • $PUNCHPLATFORM_CONF_DIR/templates

Generated files controls

Each generated file will end up in the following directory: tenants/<tenant>/channels/<channel.name>/. The channel name is defined by the channel value.

The Configure command first looks for a channel_structure file in the template directory and apply templating to it.

When the channel_structure file is generated, the configure command uses the provided applications list (applications value in the channel_structure file) to generate the corresponding punchlines.

The following settings of your channel_structure applications allow you to control how your punchline are generated :

  • applications[].name string

    MANDATORY

    the name of the generated file. The configure command finds out the extension by using the template file extension (e.g name : input and template file input.yaml.j2 generate input.yaml).

  • applications[].template string

    OPTIONAL

    the name of the template file to use. if not specified it will look for a file matching <application_name>.yaml.j2 or <application_name>.yml.j2 or <application_name>.json.j2 or <application_name>.hjson.j2.

  • applications[].additional_templates list of dict

    OPTIONAL

    a list of additional template you want to render for this application. For instance, when configuring a plan you want to generate both the punchline and the plan files.

  • applications[].additional_templates[].template string

    OPTIONAL

    the name of the additional template file to use.

  • applications[].additional_templates[].output string

    OPTIONAL

    the name of the generated file. It must define the file extension (e.g input.yaml).

  • applications[].*

    OPTIONAL

    customs values to use in your template. They are only available at the rendered application level. this is useful if you plan to use the same template to generate multiple punchlines with a slight difference such as the kafka partition to consume.

Templates files

Inside the template you can use Jinja2 (jinjava) syntax.

The following jinja variables are available for channel_structure and application rendering :

  • channel.*:

    all values defined inside your values file

  • channel.name :

    the name of the channel (same as channel.channel)

  • PUNCHPLATFORM_CONF_DIR

    the absolute path to the PUNCHPLATFORM_CONF_DIR

  • punchplatform.*

    all the setting of your generated punchplatform.properties. You will find this file at $PUNCHPLATFORM_PROPERTIES_FILE

  • tenant

    the tenant as declared in the values file (same as channel.tenant)

The following jinja variables are only available for application rendering

  • topology.*

    all settings defined in your channel structure and relative to the application you are rendering (e.g {{ topology.shiva_runner_tags }} will give you the list of shiva tags defined).

The following example show you the template files used to generate the channel :

Template of Channel Structure file

version: '6.0'
start_by_tenant: true
stop_by_tenant: true

resources:
  - type: kafka_topic
    name: {{tenant}}_{{channel.name}}_archiving
    cluster: common
    partitions: 1
    replication_factor: 1

applications:
- name: input
  runtime: shiva
  command: punchlinectl
  args:
  - start
  - --punchline
  - input.yaml
  shiva_runner_tags:
  - {{ channel.cluster_name|default('common') }}
  cluster: {{ channel.cluster_name|default('common') }}
  reload_action: kill_then_start
- name: archiving
  runtime: shiva
  command: punchlinectl
  args:
    - start
    - --punchline
    - archiving.yaml
  shiva_runner_tags:
    - common
  cluster: common
  reload_action: kill_then_start

Template of Input punchline

version: '6.0'
runtime: storm
channel: {{channel.name}}
type: punchline
meta:
  vendor: {{channel.vendor}}
  technology: {{channel.name}}
dag:

  # Syslog
  - type: syslog_input
    settings:
      listen:
        proto: tcp
        host: {{channel.input.host}}
        port: {{channel.input.port}}
      self_monitoring.activation: true
      self_monitoring.period: 10
    publish:
    - stream: logs
      fields:
      - log
      - _ppf_local_host
      - _ppf_local_port
      - _ppf_remote_host
      - _ppf_remote_port
      - _ppf_timestamp
      - _ppf_id
    - stream: _ppf_metrics
      fields:
      - _ppf_latency

  # Punchlet node
  - type: punchlet_node
    component: punchlet
    settings:
      {%- if channel.json_resources is defined and channel.json_resources | length > 0 %}
      punchlet_json_resources:
      {%- for jsonResource in channel.json_resources %}
      - {{ jsonResource }}
      {%- endfor %}
      {%- endif %}
      punchlet:
      {%- for punchlet in channel.punchlets %}
      - {{punchlet}}
      {%- endfor %}
    subscribe:
    - component: syslog_input
      stream: logs
    - component: syslog_input
      stream: _ppf_metrics
    publish:
    - stream: logs
      fields:
      - log
      - _ppf_id
      - _ppf_timestamp
    - stream: _ppf_errors
      fields:
      - _ppf_error_message
      - _ppf_error_document
      - _ppf_id
    - stream: _ppf_metrics
      fields:
      - _ppf_latency

  # ES Output
   - type: elasticsearch_output
     settings:
       per_stream_settings:
       - stream: logs
         index:
           type: daily
           prefix: {{channel.tenant}}-events-
         document_json_field: log
         document_id_field: _ppf_id
         additional_document_value_fields:
         - type: date
           document_field: '@timestamp'
           format: iso
       - stream: _ppf_errors
         document_json_field: _ppf_error_document
         additional_document_value_fields:
         - type: tuple_field
           document_field: ppf_error_message
           tuple_field: _ppf_error_message
         - type: date
           document_field: '@timestamp'
           format: iso
         index:
           type: daily
           prefix: mytenant-events-
     subscribe:
     - component: punchlet
       stream: logs
     - component: punchlet
       stream: _ppf_errors
     - component: punchlet
       stream: _ppf_metrics

  # Kafka Output
  - type: kafka_output
    settings:
      topic: {{channel.tenant}}_{{channel.name}}_archiving
      encoding: lumberjack
      producer.acks: all
      producer.batch.size: 16384
      producer.linger.ms: 5
    subscribe:
      - component: punchlet
        stream: logs
      - component: punchlet
        stream: _ppf_metrics
metrics:
 reporters:
 - type: kafka
settings:
 topology.component.resources.onheap.memory.mb: 200 # 200m * (4 nodes + 1 topo) = 1G

Template of Archiving punchline

version: '6.0'
type: punchline
channel: {{channel.name}}
runtime: storm
dag:

  # Kafka Input
- type: kafka_input
  settings:
    topic: {{tenant}}_{{channel.name}}_archiving
    start_offset_strategy: last_committed
    fail_action: exit
  publish:
  - stream: logs
    fields:
    - log
    - _ppf_id
    - _ppf_timestamp
    - _ppf_partition_id
    - _ppf_partition_offset
  - stream: _ppf_metrics
    fields:
    - _ppf_latency

  # File Output
- type: file_output
  settings:
    create_root: true
    destination: file:///tmp/archive-logs/storage # File system
    topic: {{channel.name}}
    file_prefix_pattern: '%{topic}/%{date}/puncharchive-%{tags}-%{offset}'
    batch_size: 1000
    batch_expiration_timeout: 10s
    fields:
    - _ppf_id
    - _ppf_timestamp
    - log
    encoding: csv
    compression_format: gzip
    separator: __|__
    timestamp_field: _ppf_timestamp
  subscribe:
  - component: kafka_input
    stream: logs
  - component: kafka_input
    stream: _ppf_metrics
  publish:
  - stream: metadatas
    fields:
    - metadata
  - stream: _ppf_metrics
    fields:
    - _ppf_latency

  # ES Output
- type: elasticsearch_output
  component: metadatas_indexer
  settings:
    per_stream_settings:
    - stream: metadatas
      index:
        type: daily
        prefix: {{tenant}}-archive-
      document_json_field: metadata
      batch_size: 1
      reindex_failed_documents: true
      error_index:
        type: daily
        prefix: {{tenant}}-archive-errors
  subscribe:
  - component: file_output
    stream: metadatas
  - component: file_output
    stream: _ppf_metrics

  # Metrics
metrics:
  reporters:
  - type: kafka
settings:
  topology.component.resources.onheap.memory.mb: 56 # 56m * (3 nodes + 1 topo) = 224m