Clickhouse¶
If you are not familiar with clickhouse, follow this short tour.
This sample is provided as a demo channel. First go to that folder:
cd $PUNCHPLATFORM_CONF_DIR/tenants/mytenant/channels/flights
First notice that a data injector is provided that generates sample flight data. H ave a look at the flight_injector.json file. It generates key value lines with various timestamps, flight number etc..
Try it to sipmy have a look at the generated data:
punchplatform-log-injector.sh -c flight_injector.json -n 10 --dump
9.99
event_timestamp=2021-02-14 09:36:55 uniquecarrier_id=01de62ed-9d2c-40d7-b225-cc6e67c33119 departure_timestamp=2020-02-01 02:00:00 arrival_timestamp=2020-02-05 13:45:13 flight_consumption=11518 flight_ip=192.168.99.241
event_timestamp=2021-02-14 09:36:55 uniquecarrier_id=1f6f100e-755c-44af-b724-13d91dc3f4ac departure_timestamp=2020-02-01 03:00:00 arrival_timestamp=2020-02-06 14:34:28 flight_consumption=19632 flight_ip=192.168.99.211
event_timestamp=2021-02-14 09:36:55 uniquecarrier_id=28d25953-303b-4e6f-a44b-c398388344f7 departure_timestamp=2020-02-01 04:00:00 arrival_timestamp=2020-02-01 14:19:42 flight_consumption=16136 flight_ip=192.168.99.133
...
Tip
this injector file illustrates how you can generate relative timestamps. I.e. timestamps that are
related to each other. That is useful to have arrival time generated after the departure time.
The punchline flight_punchline.yaml
receives data on a TCP sockets, parse the key value format to convert it into a map,
then insert it into a destination flights
table. The flight
table must be created first.
Here is its content:
version: '6.0'
runtime: storm
type: punchline
dag:
# Syslog input
- type: syslog_input
settings:
listen:
proto: tcp
host: 0.0.0.0
port: 9909
publish:
- stream: logs
fields:
- log
# Punchlet node
- component: punchlet
type: punchlet_node
settings:
punchlet_code: '{kv().on([logs][log]).into([logs]);}'
subscribe:
- component: syslog_input
stream: logs
publish:
- stream: logs
fields:
- event_timestamp
- uniquecarrier_id
- departure_timestamp
- arrival_timestamp
- flight_consumption
- flight_ip
# Clickhouse output
- component: output
type: clickhouse_output
settings:
hosts:
- localhost:8123
username: default
password: ''
database: default
table: flights
bulk_size: 10
column_names:
- event_timestamp
- uniquecarrier_id
- departure_timestamp
- arrival_timestamp
- flight_consumption
- flight_ip
subscribe:
- component: punchlet
stream: logs
# Topology metrics
metrics:
reporters:
- type: kafka
# Topology settings
settings:
topology.component.resources.onheap.memory.mb: 100
create table flights (
uniquecarrier_id UUID, \
event_timestamp DateTime, \
arrival_timestamp DateTime, \
departure_timestamp DateTime, \
flight_consumption UInt32, \
flight_ip IPv4) \
ENGINE = MergeTree \
PARTITION BY toYYYYMMDD(event_timestamp) \
ORDER BY event_timestamp \
;
To launch the clickhouse client run this:
$PUNCHPLATFORM_CLICKHOUSE_INSTALL_DIR/usr/bin/clickhouse client --port 9100
Now that your table has been created, use channelctl
to start our example:
channelctl start --channel flights
Inject some data:
punchplatform-log-injector.sh -c flight_injector.json -n 10
And check you have your data in clickhouse:
$PUNCHPLATFORM_CLICKHOUSE_INSTALL_DIR/usr/bin/clickhouse client --port 9100
:) select * from flights
Query id: 56caa712-495e-4f56-ba1b-91ad065b5e54
┌─────────────────────uniquecarrier_id─┬─────event_timestamp─┬───arrival_timestamp─┬─departure_timestamp─┬─flight_consumption─┬─flight_ip──────┐
│ c68a3b35-3fbf-4e42-9b59-62ff73c4268b │ 2021-02-14 09:13:58 │ 2020-02-03 11:20:53 │ 2020-02-01 01:00:00 │ 22569 │ 192.168.99.79 │
│ 4e6ad8e0-307c-4f38-a9c5-a0b0ec4f33aa │ 2021-02-14 09:13:58 │ 2020-02-04 09:32:47 │ 2020-02-01 02:00:00 │ 2826 │ 192.168.99.86 │
│ 785873e3-3ac0-4527-8acb-78813cf865a4 │ 2021-02-14 09:13:58 │ 2020-02-02 07:46:39 │ 2020-02-01 03:00:00 │ 22382 │ 192.168.99.207 │
│ bf6a5983-b911-4210-aad1-3d52533e1d70 │ 2021-02-14 09:13:58 │ 2020-02-04 21:54:59 │ 2020-02-01 04:00:00 │ 10480 │ 192.168.99.129 │
│ b6550c4b-fb13-460d-b88b-7b967244a039 │ 2021-02-14 09:13:58 │ 2020-02-02 05:16:52 │ 2020-02-01 05:00:00 │ 14923 │ 192.168.99.20 │
│ e4102d0d-372f-41c2-9a1f-67ef96de1718 │ 2021-02-14 09:13:58 │ 2020-02-03 21:32:07 │ 2020-02-01 06:00:00 │ 29558 │ 192.168.99.227 │
│ df494d46-e020-4d0c-a804-c4c1476a48ba │ 2021-02-14 09:13:58 │ 2020-02-01 13:08:00 │ 2020-02-01 07:00:00 │ 3284 │ 192.168.99.72 │
│ 9bcfab97-8e92-4d67-a0fb-ca45827c4cc1 │ 2021-02-14 09:13:58 │ 2020-02-04 04:00:15 │ 2020-02-01 08:00:00 │ 15636 │ 192.168.99.3 │
│ a0cdf9ef-1b06-4ab4-9fda-50a464534ec0 │ 2021-02-14 09:13:58 │ 2020-02-02 06:05:24 │ 2020-02-01 09:00:00 │ 27845 │ 192.168.99.235 │
│ 1b234b9b-98fb-4436-8ebc-6dd49c1220b0 │ 2021-02-14 09:13:58 │ 2020-02-01 15:00:03 │ 2020-02-01 10:00:00 │ 13438 │ 192.168.99.61 │
└──────────────────────────────────────┴─────────────────────┴─────────────────────┴─────────────────────┴────────────────────┴────────────────┘
10 rows in set. Elapsed: 0.002 sec.
Refer to the clickhouse output node for the configuration options.