Punchlines¶

The next concept to understand are Punchlines. A Punchline is a data pipeline, configured to fetch or receive data, to process it and push it downstream.

Here we will focus on stream punchline. In the next chapter, we will focus on batch punchline in Spark.

Have a look at the input.yaml file :

version: '6.0'
runtime: storm
type: punchline
channel: stormshield
meta:
  vendor: stormshield
dag:

  # Syslog input
  - type: syslog_input
    settings:
      listen:
        proto: tcp
        host: 127.0.0.1
        port: 9903
    publish:
      - stream: logs
        fields:
          - log
          - _ppf_local_host
          - _ppf_local_port
          - _ppf_remote_host
          - _ppf_remote_port
          - _ppf_timestamp
          - _ppf_id

  # Punchlet node
  - type: punchlet_node
    settings:
      punchlet_json_resources: []
      punchlet:
        - punch-common-punchlets-1.0.0/com/thalesgroup/punchplatform/common/input.punch
        - punch-common-punchlets-1.0.0/com/thalesgroup/punchplatform/common/parsing_syslog_header.punch
        - punch-stormshield-parsers-1.0.0/com/thalesgroup/punchplatform/stormshield/network_security/parser_network_security.punch
    subscribe:
      - component: syslog_input
        stream: logs
    publish:
      - stream: logs
        fields:
          - log
          - _ppf_id
      - stream: _ppf_errors
        fields:
          - _ppf_error_message
          - _ppf_error_document
          - _ppf_id

  # ES Output
  - type: elasticsearch_output
    settings:
      per_stream_settings:
        - stream: logs
          index:
            type: daily
            prefix: mytenant-events-
          document_json_field: log
          document_id_field: _ppf_id
          additional_document_value_fields:
            - type: date
              document_field: '@timestamp'
              format: iso
        - stream: _ppf_errors
          document_json_field: _ppf_error_document
          additional_document_value_fields:
            - type: tuple_field
              document_field: ppf_error_message
              tuple_field: _ppf_error_message
            - type: date
              document_field: '@timestamp'
              format: iso
          index:
            type: daily
            prefix: mytenant-events-
          document_id_field: _ppf_id
    subscribe:
      - component: punchlet_node
        stream: logs
      - component: punchlet_node
        stream: _ppf_errors

# Topology metrics
metrics:
  reporters:
    - type: kafka

# Topology settings
settings:
  topology.component.resources.onheap.memory.mb: 56 # 56m * (3 nodes) = 168m

It implements a stream pipeline which :

receives logs on a TCP socket.
parses and enriches this log with Punchlets.
indexes transformed logs into Elasticsearch.

Start this punchline in foreground :

punchlinectl --tenant mytenant start --punchline $PUNCHPLATFORM_CONF_DIR/tenants/mytenant/channels/stormshield_networksecurity/input.yaml

A stream pipeline is now running and ready to receive logs.

Now, we will inject some logs using the Punch injector tool. It will generate Stormshield logs and send them to your punchline.

In another terminal :

punchplatform-log-injector.sh -c $PUNCHPLATFORM_CONF_DIR/resources/injectors/mytenant/stormshield_networksecurity_injector.json

Check your Elasticsearch, your logs are indexed in mytenant-events-*.

As you see, punchlines are quite simple to understand. You can do all sort of stream computing with them. Now that you have a good understanding of stream punchlines, let's move on to batch punchlines.