punchplatform-log-injector.sh

NAME¶

punchplatform-log-injector.sh -

SYNOPSIS¶

punchplatform-log-injector -c $PUNCHPLATFORM_CONF_DIR/resources/injector -n 100 -t 50

DESCRIPTION¶

punchplatform-log-injector.sh is a fully configurable log injector. You can use it to inject test messages into any message processing platform. To do that you define injection json configuration files containing the load characteristics, the log message format, and the destination input point.

The injector is capable of writing to sockets (udp, tcp), to Kafka, to Elasticsearch, to Clickhouse and to lumberjack endpoints.

The injector can also play a server role. You can use it to bench your networking plane, or to test a punchplatform. It can read from tcp, udp, lumberjack and Kafka.

OPTIONS¶

-brokers <arg> :
- the kafka broker where to read from.
-c <campaign.json> | -c <campaign1.json>,<campaign2.json> | -c <campaign directory> :
- Run a single or several campaigns, or all campaigns found in the specified directory. If you run several campaigns, each will be run using a dedicated thread.
-check,--dump :
- dump to stdout instead of injecting.
-cl | --lumberjack-client :
- Starts running as a Lumberjack client. You must set the server host and port number using the host and port option. By default the lumberjack client will send 32 bytes strings. If you want to simulate real traffic you must define a configuration injection file.
-d | --delay <value_in_ms> :
- overrides injection file throughput or inter message delay, in milliseconds. This can be used to inject very low traffic rates.
-earliest :
- start consuming kafka message from the earliest.
-h,--help :
- print this message.
-H,--host <arg> :
- overrides injection file destination host.
-k | --kafka-consumer :
- Starts running as Kafka consumer. You must set the topic and broker options.
-latest :
- start consuming kafka message from the latest.
-n | --number <message_number> :
- Exits after that many messages have been sent.
-p | --port <rate> :
- Set the (destination or listening) port number. This overload the rate defined in the configuration file, if any.
-punchlets,--punchlets <arg> :
- stress a chain of punchlets (comma separated).
-resources,--resources <arg> :
- add punchlet resources (comma separated).
-q,--silent :
- reduce verbosity to error messages.
-sl | --lumberjack-server :
- Starts running as a Lumberjack server. You must set the port number using the port option.
-st | --tcp-server :
- Starts running as a plain TCP server. You must set the port option.
-stream :
- Define storm stream for injected logs
-t | --throughput <rate> :
- Define the traffic rate, possibly overloading the rate defined in the configuration file, if any.
--thread <thread-number> :
- By default each injection is singled thread. To simulate several connections to the server, increase the number of threads. Each will take a part of the total throughput defined in your scenario.
-it, --inactivity-timeout <timeout string> :
- Inactivity duration before exiting the injector. Default to infinity.
-topic <arg> :
- the kafka topic.
-ts,--tcp-server <arg> :
- act as tcp server to count the number of received logs. You must set the port number.
-u,--udp:
- use udp.
-us,--udp-server <arg>:
- act as udp server to count the number of received logs. You must set the port number.
-v,--verbose :
- prints out the read data. It only work with some sender or receiver.
-w,--connection-timeout <arg> :
- defines maximum wait time in ms for the receiver port to be available (not in udp mode) - 0 (default value) means infinite wait. Also applies on reconnection after connection loss.
--sustain :
- This option is relevant only for the lumberjack client. The client will send increasing traffic to the server, and will stop when the window of unacknowledged messages reaches 1000. This lets you easily check the bandwidth of your system.
-lj,--lumberjack-json-fields-payload :
- Instead of the payload message being injected as a string in the 'log' field, the payload message provided is expected to be a json string, that defines the root fields and values in the lumberjack frame. This allow using an other field than 'log', or to provide multi-fielded lumerjack frames.
-cp, --compression:
- Enable compression for Lumberjack protocol (option valid for Lumberjack server and client).
--ssl_private_key :
- This option is relevant only for the lumberjack client and server. Specify a private key path.
--ssl_certificate :
- This option is relevant only for the lumberjack client and server. Specify a certificate key path.
--ssl_protocol :
- This option is relevant only for the lumberjack client and server. Specify SSL protocol between TLSv1.2 (default), TLSv1.1, TLSv1.0.
--ssl_provider :
- This option is relevant only for the lumberjack client. Specify a SSL provider between JDK (default), OPENSSL, OPENSSL_REFCNT.
--ssl_ciphers :
- This option is relevant only for the lumberjack client. Specify and overrides SSL ciphers, use provider ciphers suit by default. Specify a comma separated list for custom ciphers, for example: TLS_DHE_RSA_WITH_AES_256_CBC_SHA256,TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA

EXAMPLES¶

Inject apache traffic . The destination and load characteristics is defined in an injection file.

punchplatform-log-injector.sh -c apache_injection.json

Idem but changing the rate to 1500 messages per seconds on stream [logs][log]:

punchplatform-log-injector.sh -c apache_injection.json --throughput 1500 --stream [logs][log]

Running a lumberjack server listening on tcp/lumberjack port 21212

punchplatform-log-injector.sh --lumberjack-server --port 21212 --compression

Sending lumberjack traffic to the server we just started

punchplatform-log-injector.sh -c lumberjack_injector.json -t 1000 --compression

Same with SSL

punchplatform-log-injector.sh --lumberjack-server --port 21212 --compression \
  --ssl_private_key conf/resources/ssl/server-key.pem \
  --ssl_certificate conf/resources/ssl/server-cert.pem

punchplatform-log-injector.sh -c lumberjack_injector.json -t 1000 --compression \
  --ssl_private_key conf/resources/ssl/server-key.pem \
  --ssl_certificate conf/resources/ssl/server-cert.pem

1	`- You have to precise the protocol in the injector configuration file`

adding lumberjack for that field. In you need an example just check in the $PUNCHPLATFORM_CONF_DIR/conf/resources/injector/examples/ repository Take care, you can't add a --punchlets parameter to the lumberjack, in that case it will only send the data to the punchlet without taking the protocol into account.

Checking the traffic received from a Kafka topic. Note here that the 'local' kafka broker must be defined. in your punchplatform.properties file. I.e. this option only works on an installed punchplatform node.

punchplatform-log-injector.sh --kafka-server --topic apache --brokers local -v

Test or stress a punchlets pipeline to see overall performance

punchplatform-log-injector.sh -c [injection_file].json --punchlets p1.punch,p2.punch,... --resources r1.json,r2.json,...

Sending lumberjack traffic to the server we just started with custom SSL configuration

punchplatform-log-injector.sh -c lumberjack_injector.json -t 1000 \
--ssl_private_key resources/ssl/certs/gateway/gateway-super-1-key-pkcs8.pem \
--ssl_certificate resources/ssl/certs/gateway/gateway-super-1-cert.pem \
--ssl_protocol TLSv1.2 \
--ssl_provider JDK \
--ssl_ciphers TLS_DHE_RSA_WITH_AES_256_CBC_SHA256,TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA

Configuration file¶

The injection file is a JSON file, in which you are free to add \'#\' prefixed comments.

The various sections are described below.

Destination Section¶

You can set in your injection file the data destination, i.e. where you want to send your generated data. This section is optional, you can define it using the command-line parameters. If you set one, you can also override it using online parameters. Here is an example to send your generated data to a TCP server.

{
    "destination" : { 
        "proto" : "tcp", 
        "host" : "127.0.0.1", 
        "port" : 9999 
    },
...
}

The supported destination are :

tcp: send the data to TCP server
udp: send the data to UDP server
lumberjack: send the data to UDP server
http: performs POST REST requests to an http server
stdout: just print out the generated data. Use for debugging purposes and copypasta.
kafka: act as a kafka producer, toward a given topic
elasticsearch: inject data directly to an Elasticsearch cluster.

Here are examples configurations for the \"destination\" section:

# all these require plain host port parameters
{ "proto" : "tcp", "host" : "127.0.0.1", "port" : 9999 }
{ "proto" : "udp", "host" : "127.0.0.1", "port" : 9999 }
{ "proto" : "lumberjack", "host" : "127.0.0.1", "port" : 9999, "compression": false }
{ "proto" : "http", "host" : "127.0.0.1", "port" : 9999, 
    "http_method": "POST", 
    "http_root_url": "/", 
    "bulk_size": 1 
}

# Elasticsearch configuration ('port' is optional, 'bulk_size' default is 1)
{ 
    "proto" : "elasticsearch", "host": "127.0.0.1", "port": 9300, 
    "cluster_name" : "es_search", 
    "index": "test", 
    "type": "doc", 
    "bulk_size": 1000 }

# Kafka only accepts a "brokers" name that must be defined in your 
# punchplatform.properties file. That is : this option only works
# (as of today) on an installed punchplatform.
{ 
    "proto": "kafka", 
    "brokers": "local", 
    "topic": "mytenant_bluecoat_proxysg"
}

Load Section¶

This section lets you control the injector\'s throughput. It is also optional if you prefer using online parameters.

"load" :{

    # "message_throughput" indicates the number of message per second.
    # Sometimes you want to inject fewer message than 1 per second, 
    # you can then use the alternative property : "inter_message_delay" 
    # For example to inject one message every 30 seconds :
    #   "inter_message_delay" : 30
    # 
    "message_throughput" : 1000,

    # Optional : control how often you have a recap stdout message. 
    "stats_publish_interval" : "2s",

    # The total number of message. Use -1 for almost infinite (2³¹-1 messages). 
    "total_messages" : 1000000,

    # Optional : make you throughput fixed or variable. By default fixed.
    # Using "variable" makes your load vary between 50 and 150 % of your 
    # set throughput.
    "type" : "fixed"
}

Punchlets Performance Test¶

The injector is great to stress one or a chain of punchlet under a high load of data. Using the --punchlets argument you basically make a chain a punchlets traversed by tons of (representative) data.

To check everything runs fine before stressing the punchlets, use the "-v" option to dump the punchlet result Again the -t option is your friend here to do that slowly

punchplatform-log-injector.sh -c <json-injection-file> --punchlets punchlet1,punchlet2,.. -t 1 -v

If you need to include punchlet resources, use --resources option

punchplatform-log-injector.sh -c <json-injection-file> \
    --punchlets standard/common/input.punch,standard/common/parsing_syslog_header.punch,... \
    --resources standard/apache_httpd/taxonomy.json,standard/apache_httpd/http_codes.json \
    --dump

Note on punchlet performance : you should expect on a Intel Core i7 2,5GHz:
- running the injection without doing nothing : 730 Keps
- running the injection with the input tuple creation only : 670 Keps
- running the injection with the punchlets : 30 Keps

Message Section¶

This mandatory section contains the payload sent by the log injector.

"message" : {

    # the payloads are templates of what you inject. In there you 
    # can insert %{} variable fields that will be replaced by the 
    # corresponding element you define in the "fields"section 
    # described right next. For example here, %{src} will be replaced 
    # by the "src" field.
    #
    # You can define a single payload. Should you define several 
    # ones like illustrated here, the injector will simply round-robin 
    # on each one.
    #
    # Every time a message is generated, each %{} variable field is 
    # replaced by a new value.
    #
    # You can thus finely control what your output data will look like.
    #

    "payloads" : [
        "%{timestamp}: New session from IP %{src_ip} UUID %{uuid}.",
        "%{timestamp}: %{owner} visited URL %{url} %{nb_visits} times.",
        "%{timestamp}: %{owner} also uploaded %{outbytes}kb and downloaded %{inbytes}kb."
    ],

    # The fields sections lets you define various kind of generated 
    # values. In the following all the supported injector fields are 
    # described.

    "fields" : {

        "src_ip" : {
            # Generate IPV4 addresses. 
            "type" : "ipv4",

            # You use brackets to control what part of the address 
            # you want to make variable. Here all of them. 
            "format" : "[0-255].[0-255].[0-255].[0-255]"
        },
        "url" : {

            # Take the values from a list. Every time a value is 
            # generated you getn next element of your list.
            "type" : "list",

            # Here is your list. 
            "content" : [
                "GET /ref/index.html HTTP/1.1", 
                "GET /yet/another.html.css HTTP/1.1"
                ]
        },

        "owner" : {
            "type" : "list",
            "content" : ["frank", "bob", "alice", "ted", 
                            "dimi", "ced", "phil", "julien"]

            # This time we want to iterate differently. We want to 
            # send "frank" 3 times then "bob" 3 times and so on. 
            "update_every_loop": false,
            "update_every": 3
        },

        "uuid": {
            # Generate a valid unique string identifier
            "type": "session_id"
        }

        "nb_visits" : {
            "type" : "counter",
            "min" : 0,
            "max" : 12
        },
        "inbytes" : {
            "type" : "random",
            "min" : 1000,
            "max" : 30000
        },
        "outbytes" : {
            "type" : "gaussian",
            "mean": 200.0,
            "deviation" : 30.0,
            "mantissa_precision": 2,
            "always_positive": true
        },
        "timestamp" : {
            "type" : "timestamp",
            "format" :  "dd/MMM/yyyy:HH:mm Z",
            "start_time" : "2012.12.31",
            "start_time_format" : "yyyy.MM.dd",
            "tick_interval" : "1h"
        }
    }
}

In many case you want to send json payloads. You can use embedded Json to make it easier. An example explains it all:

"message" : {

    "payloads" : [
        { 
            "time" : "%{timestamp}", 
            "aNumber" : %{number} 
        }
    ],

Note

the resulting file is not a valid Json anymore because the %{number} would require to be enclosed by quotes. The log injectors will deal with it, but that suppose you generate a numerical or boolean value..

Here are the several supported templated types:

ipv4 : to generate ipv4 addresses
list : to loop over a set of items
counter : an iterating numeric value
random : a random numeric value following uniform probability density.
gaussian : a random value following a gaussian probability density.
session_id : Generates an UUID
timestamp : a timestamp, for which you fully control the format, the start time, and the tick interval. You can also refer the start time of one timestamp to another.

Loop control¶

Whatever be the type you can control the value generation using the following optional parameter:

update_every_loop : boolean
- control the way the field is updated, either each time or one out of update_every loop. Note that if set to false, the update_every parameter is mandatory.
- default: true
update_every : int
- the number of loop iterations before the generated value is changed.
- default: 1

session_id¶

Using the sessions_idtype you can generate short yet unique string id. These ids are similar to youtube or elasticsearch ids. An example value is 'VVkgmncB4VD0JSomcivo'.

    "uniquecarrier_id": {
        "type": "session_id"
    }

uuid¶

Using the uuidtype you can generate standard uuid.

    "uniquecarrier_id": {
        "type": "uuid"
    }

list¶

content : boolean
- an array of values the injector will loop over.
- example: [ 1, 2, 3 ], [ "hello", "world" ]

counter¶

min : the (inclusive) min value
max : the (inclusive) max value

random¶

min : the (exclusive) min value
max : the (exclusive) max value

gaussian¶

mean : int
- the average value of the repartition.
- default: 0
deviation : int
- the standard deviation. Note: this means that 68% of the values will be contained in [mean]{.title-ref}+`deviation`
- default: 1
mantissa_precision : int
- number of digits after the comma. If set to 0, the comma char \'.\' itself is removed (integer).
- default: 0
always_positive : boolean
- only generate positive values. Note that the gaussian is cropped also in 2*AVERAGE to keep the mean value intact
- default: true

timestamp¶

Using the timestamp field you can generate time at the format you need. Here is a simple explicit example:

            "departure_timestamp" : {
                "type" : "timestamp",
                "format" :  "dd/MMM/yyyy:HH:mm Z",
                "start_time" : "2012.12.31",
                "start_time_format" : "yyyy.MM.dd",
                "tick_interval" : "1h"
            }

That will produce :

departure_timestamp=31/Dec/2012:01:00 +0100
departure_timestamp=31/Dec/2012:02:00 +0100
...

Controlling the timstamp to make it start from another one is handy to generate timestamps that represent a time interval. Say you wand to add an "arrival" timestamp based on your departure timestamp plus a random value expressed in hours. Here is how you do it.

            "departure_timestamp" : {
                "type" : "timestamp",
                "format" :  "dd/MMM/yyyy:HH:mm Z",
                "start_time" : "2012.12.31",
                "start_time_format" : "yyyy.MM.dd",
                "tick_interval" : "1h"
            },
            "arrival_timestamp" : {
                "type" : "timestamp",
                "format" :  "dd/MMM/yyyy:HH:mm Z",
                "relative_start_time" : "dep_timestamp",
                "duration" : {
                    "type" : "random",
                    "unit" : "minute",
                    "min" : 120,
                    "max" : 360
                }
            }

You will get:

departure_timestamp=31/Dec/2012:06:00 +0100 arrival_timestamp=02/Jan/2013:14:23 +0100
departure_timestamp=31/Dec/2012:07:00 +0100 arrival_timestamp=31/Dec/2012:14:30 +0100
departure_timestamp=31/Dec/2012:08:00 +0100 arrival_timestamp=03/Jan/2013:14:38 +0100

Return codes¶

The punchplatform-log-injector utility exits 0 on success, and >0 if an error occurs.

Environment¶

The following environment variables affect the execution of punchplatform-log-injector.sh:

PUNCHPLATFORM_CONF_DIR
- The PUNCHPLATFORM_CONF_DIR_CONFDIR environment variable indicate the directory where tenant and channel configuration files are stored. A 'tenants' subdirectory is expected. Below you will find a tenant then channel directory tree.

Bugs¶

No known bugs.