Skip to content

Clickhouse

If you are not familiar with clickhouse, follow this short tour.

This sample is provided as a demo channel. First go to that folder:

cd $PUNCHPLATFORM_CONF_DIR/tenants/mytenant/channels/flights

First notice that a data injector is provided that generates sample flight data. H ave a look at the flight_injector.json file. It generates key value lines with various timestamps, flight number etc..

Try it to sipmy have a look at the generated data:

punchplatform-log-injector.sh -c flight_injector.json -n 10 --dump
9.99
event_timestamp=2021-02-14 09:36:55 uniquecarrier_id=01de62ed-9d2c-40d7-b225-cc6e67c33119 departure_timestamp=2020-02-01 02:00:00 arrival_timestamp=2020-02-05 13:45:13 flight_consumption=11518 flight_ip=192.168.99.241
event_timestamp=2021-02-14 09:36:55 uniquecarrier_id=1f6f100e-755c-44af-b724-13d91dc3f4ac departure_timestamp=2020-02-01 03:00:00 arrival_timestamp=2020-02-06 14:34:28 flight_consumption=19632 flight_ip=192.168.99.211
event_timestamp=2021-02-14 09:36:55 uniquecarrier_id=28d25953-303b-4e6f-a44b-c398388344f7 departure_timestamp=2020-02-01 04:00:00 arrival_timestamp=2020-02-01 14:19:42 flight_consumption=16136 flight_ip=192.168.99.133
...

Tip

this injector file illustrates how you can generate relative timestamps. I.e. timestamps that are

related to each other. That is useful to have arrival time generated after the departure time.

The punchline flight_punchline.yaml receives data on a TCP sockets, parse the key value format to convert it into a map, then insert it into a destination flights table. The flight table must be created first.

Here is its content:

version: '6.0'
runtime: storm
type: punchline
dag:

# Syslog input
- type: syslog_input
  settings:
    listen:
      proto: tcp
      host: 0.0.0.0
      port: 9909
  publish:
  - stream: logs
    fields:
    - log

# Punchlet node
- component: punchlet
  type: punchlet_node
  settings:
    punchlet_code: '{kv().on([logs][log]).into([logs]);}'
  subscribe:
  - component: syslog_input
    stream: logs
  publish:
  - stream: logs
    fields:
    - event_timestamp
    - uniquecarrier_id
    - departure_timestamp
    - arrival_timestamp
    - flight_consumption
    - flight_ip

# Clickhouse output
- component: output
  type: clickhouse_output
  settings:
    hosts:
    - localhost:8123
    username: default
    password: ''
    database: default
    table: flights
    bulk_size: 10
    column_names:
    - event_timestamp
    - uniquecarrier_id
    - departure_timestamp
    - arrival_timestamp
    - flight_consumption
    - flight_ip
  subscribe:
  - component: punchlet
    stream: logs

# Topology metrics
metrics:
  reporters:
  - type: kafka

# Topology settings
settings:
  topology.component.resources.onheap.memory.mb: 100
To use it you must first create a table in clickhouse. Execute the following create statement using the clickhouse client.

create table flights (
  uniquecarrier_id UUID, \
  event_timestamp DateTime, \
  arrival_timestamp DateTime, \
  departure_timestamp DateTime, \
  flight_consumption UInt32, \
  flight_ip IPv4) \ 
  ENGINE = MergeTree \
  PARTITION BY toYYYYMMDD(event_timestamp) \
  ORDER BY event_timestamp \
  ;

To launch the clickhouse client run this:

$PUNCHPLATFORM_CLICKHOUSE_INSTALL_DIR/usr/bin/clickhouse client --port 9100

Now that your table has been created, use channelctl to start our example:

channelctl start --channel flights

Inject some data:

punchplatform-log-injector.sh -c flight_injector.json -n 10

And check you have your data in clickhouse:

$PUNCHPLATFORM_CLICKHOUSE_INSTALL_DIR/usr/bin/clickhouse client --port 9100
 :) select * from flights
 Query id: 56caa712-495e-4f56-ba1b-91ad065b5e54

┌─────────────────────uniquecarrier_id─┬─────event_timestamp─┬───arrival_timestamp─┬─departure_timestamp─┬─flight_consumption─┬─flight_ip──────┐
│ c68a3b35-3fbf-4e42-9b59-62ff73c4268b │ 2021-02-14 09:13:58 │ 2020-02-03 11:20:53 │ 2020-02-01 01:00:00 │              22569 │ 192.168.99.79  │
│ 4e6ad8e0-307c-4f38-a9c5-a0b0ec4f33aa │ 2021-02-14 09:13:58 │ 2020-02-04 09:32:47 │ 2020-02-01 02:00:00 │               2826 │ 192.168.99.86  │
│ 785873e3-3ac0-4527-8acb-78813cf865a4 │ 2021-02-14 09:13:58 │ 2020-02-02 07:46:39 │ 2020-02-01 03:00:00 │              22382 │ 192.168.99.207 │
│ bf6a5983-b911-4210-aad1-3d52533e1d70 │ 2021-02-14 09:13:58 │ 2020-02-04 21:54:59 │ 2020-02-01 04:00:00 │              10480 │ 192.168.99.129 │
│ b6550c4b-fb13-460d-b88b-7b967244a039 │ 2021-02-14 09:13:58 │ 2020-02-02 05:16:52 │ 2020-02-01 05:00:00 │              14923 │ 192.168.99.20  │
│ e4102d0d-372f-41c2-9a1f-67ef96de1718 │ 2021-02-14 09:13:58 │ 2020-02-03 21:32:07 │ 2020-02-01 06:00:00 │              29558 │ 192.168.99.227 │
│ df494d46-e020-4d0c-a804-c4c1476a48ba │ 2021-02-14 09:13:58 │ 2020-02-01 13:08:00 │ 2020-02-01 07:00:00 │               3284 │ 192.168.99.72  │
│ 9bcfab97-8e92-4d67-a0fb-ca45827c4cc1 │ 2021-02-14 09:13:58 │ 2020-02-04 04:00:15 │ 2020-02-01 08:00:00 │              15636 │ 192.168.99.3   │
│ a0cdf9ef-1b06-4ab4-9fda-50a464534ec0 │ 2021-02-14 09:13:58 │ 2020-02-02 06:05:24 │ 2020-02-01 09:00:00 │              27845 │ 192.168.99.235 │
│ 1b234b9b-98fb-4436-8ebc-6dd49c1220b0 │ 2021-02-14 09:13:58 │ 2020-02-01 15:00:03 │ 2020-02-01 10:00:00 │              13438 │ 192.168.99.61  │
└──────────────────────────────────────┴─────────────────────┴─────────────────────┴─────────────────────┴────────────────────┴────────────────┘

10 rows in set. Elapsed: 0.002 sec.

Refer to the clickhouse output node for the configuration options.