Skip to content

Stormline

Stormline CRD instances are managed by the Stormline Operator

Stormline instances are designed to answer streaming pipelines use-cases.

Note

Although this is not common in the world of big-data, we enabled the support of batch use-cases on a subset of our supported nodes. e.g. extracting data from elasticsearch.

Prerequisites

In order to run a Stormline, all you need is :

And that's it ! You can now create a Stormline job and submit it to your Kubernetes Cluster.

Job Settings

Your job configuration is a simple yaml file, composed of two main parts :

  • The metadata of your job
  • The spec of your job

# Example
apiVersion: punchline.gitlab.thalesdigital.io/v1
kind: Stormline
metadata:
  ...
spec:
  ...
Settings such as apiVersion, kind and metadata are common fields, part of kubernetes terminology. metadata field is propagated to all the stormline instance sub-resources.

Some of your stormline settings, such as the reference to your secrets, service account or resourcectl image should be provided by your platform crd. This way, you can focus on your job and not on your platform configurations.

Metadata

In this section, we'll set the name, labels and annotations for this stormline. Even though they're not required, setting correct labels can prove useful to manipulate resources and to have custom labels for your metrics and dashboards.

Recommended labels :

Name Type Description
app Optional Name of your Stormline.
channel Optional Name of your channel.
tenant Optional Name of your tenant.

Annotations :

Name Type Description
platform.gitlab.thalesdigital.io/platform Required Name of your Platform crd.
prometheus.io/scrape Optional Set to "true" to activate prometheus metrics
prometheus.io/path Optional Set to "/metrics" when activating prometheus metrics
prometheus.io/port Optional Port exposing prometheus metrics (usually "9101")
# Example
metadata:
  name: my-job
  labels:
    app: my-job
    tenant: my-tenant
    channel: my-channel
  annotations:
    prometheus.io/scrape: "true"
    prometheus.io/path: "/metrics"
    prometheus.io/port: "9101"
    platform.gitlab.thalesdigital.io/platform: platform

Spec

In this section, we provide the main stormline settings.

Name Type Description
image Required Stormline image reference.
punchline Required Dag and settings of your punchline.
metrics Optional Reporters and metrics settings.
dependencies Optional Artifact ids of your job's dependencies.
# Example
spec:
  image: ghcr.io/punchplatform/stormline:7.0.1
  metrics:
    port: 9101
  dependencies:
    - punch-parsers:org.thales.punch:my-punchlets:1.0.0
  punchline:
    platform: my-platform
    dag:
      ...

Advanced Concepts

Stormline Operator lifecycle

Note

Only the core loop is described below and this is not the complete lifecycle of a stormline instance

A stormline instance can go through five different phases, similarly to Pod Phases: Pending, Running, Succeeded, Failed and Unknown.

  • When an instance of a Stormline CRD is submitted to kubernetes ApiServer
  • stormline instance status should be empty
  • Stormline Operator will catch the submitted event and update the stormline instance status to Pending
  • During Pending phase, needed kubernetes sub-resources (pods and configmaps) will be created by the Stormline Operator
  • OwnerReferences are also set to all created sub-resources to the stormline instance
  • When all sub-resources are created successfully, the stormline instance status will be updated to Running
  • While in Running phase, the Stormline Operator will aggregate all sub-resources owned by stormline instance and update the stormline instance status based on the aggregated result

Advanced Settings

spec:
  # Defaults to false if not set
  # State which context should be used to execute the stormline instance
  # true: Pod
  # false (default) : Deployment
  oneshot: false

  # Combine with oneshot to garbage instance upon Succeeded Phase
  # Default to false
  garbageCollect: false

  # Usually filled by platform crd
  # Define a SA in case additional rbac or imagePullSecrets are needed during runtime
  serviceAccount: admin-user

  # Usually filled by platform crd
  # Init container image, usually resourcectl
  initContainerImage: resourcectl:7.0.1

  # Usually filled by platform crd
  # This field enables you to mount secret resources belonging in the same namespace as the stormline instance
  # so as your program can consume them for various purpose: e.g. fetching data from an elasticsearch cluster.
  secretRefs:
    - name: "resourcectl-tls"
      MountPath: "/var/run/kubernetes/platform/secrets/resourcectl/resourcectl-tls"

  # Additional stormline settings
  settings:
    childopts: -server -Xms200m -Xmx200m

  # Define additional files you want to be mounted on the container filesystem during runtime
  # key: file_name
  # value: file_content
  configs:
    # This will create a file 'myCustomConfMountedOnPod'
    # with content: <value of the key>
    myCustomConfMountedOnPod: |
      # this content will be mounted on
      # the pod container local filesystem at
      # /data/myCustomConfMountedOnPod
      test: hello world

Example(s)

Webhook / Syslog to Kafka

---
# Kube section
apiVersion: punchline.gitlab.thalesdigital.io/v1
kind: Stormline
metadata:
  name: sourcefire-input
  labels:
    app: sourcefire-input
    tenant: mytenant
    channel: sourcefire
    technology: cisco
  annotations:
    prometheus.io/path: "/metrics"
    prometheus.io/port: "9101"
    prometheus.io/scrape: "true"
    platform.gitlab.thalesdigital.io/platform: platform
spec:
  image: ghcr.io/punchplatform/stormline:7.0.1-SNAPSHOT

  # Prometheus metrics endpoint
  metrics:
    port: 9101


  # Dependencies section
  dependencies:
    - punch-parsers:org.thales.punch:punch-cisco-parsers:1.0.0
    - punch-parsers:org.thales.punch:punch-common-punchlets:1.0.2
    - file:org.thales.punch:punch-geoipv4-resources:1.0.0
    - file:org.thales.punch:punch-geoipv6-resources:1.0.0

  # Punchline section
  punchline:
    platform: dev4
    settings:
      topology.worker.childopts: "-server -Xms1g -Xmx1g"
      topology.max.spout.pending: "2000"
      topology.enable.message.timeouts: "true"
      topology.message.timeout.secs: "60"

    dag:

      # Syslog input
      - type: syslog_input
        component: syslog_input
        settings:
          listen:
            proto: tcp
            host: 0.0.0.0
            port: 9904
        publish:
          - stream: logs
            fields:
              - log
              - _ppf_local_host
              - _ppf_local_port
              - _ppf_remote_host
              - _ppf_remote_port
              - _ppf_timestamp
              - _ppf_id

      # Punchlet Parser
      - type: punchlet_node
        component: punchlet_simple
        settings:
          punchlet_json_resources: []
          punchlet_grok_pattern_dirs:
            - org/thales/punch/common/groks
          punchlet:
            - org/thales/punch/common/input.punch
            - org/thales/punch/common/parsing_syslog_header.punch
            - org/thales/punch/cisco/sourcefire/parser_sourcefire.punch
            - org/thales/punch/common/geoip.punch
        subscribe:
          - component: syslog_input
            stream: logs
        publish:
          - stream: logs
            fields:
              - log
              - _ppf_id
              - _ppf_timestamp

      # Elastic Output
      - type: elasticsearch_output
        component: es_output
        settings:
          cluster_id: es_search
          credentials:
            user: admin
            password: password
          http_hosts:
            - host: punchplatform-es-http.doc-store
              port: 9200
          credentials:
            user: elastic
            password: elastic

          # Production settings for errors
          reindex_failed_documents: true
          error_index:
            type: daily
            prefix: mytenant-events-indexation-errors-

          per_stream_settings:
            - stream: logs
              index:
                type: daily
                prefix: mytenant-events-
              document_json_field: log
              document_id_field: _ppf_id
              additional_document_value_fields:
                - type: date
                  document_field: "@timestamp"
                  format: iso
        subscribe:
          - component: punchlet_simple
            stream: logs


      - type: kafka_output
        component: to_kafka
        settings:
          brokers: common
          bootstrap.servers: kafka-kafka-bootstrap.processing:9092
          topic: sourcefire-events
          encoding: lumberjack
          producer.acks: all
          producer.batch.size: 16384
          producer.linger.ms: 5
        subscribe:
          - component: punchlet_simple
            stream: logs