Skip to content

punchplatform-log-injector.sh

NAME

punchplatform-log-injector.sh -

SYNOPSIS

1
punchplatform-log-injector -c $PUNCHPLATFORM_CONF_DIR/resources/injector -n 100 -t 50

DESCRIPTION

punchplatform-log-injector.sh is a fully configurable log injector. You can use it to inject test messages into any message processing platform. To do that you define injection json configuration files containing the load characteristics, the log message format, and the destination input point.

The injector is capable of writing to sockets (udp, tcp), to Kafka, and to lumberjack endpoints.

The injector can also play a server role. You can use it to bench your networking plane, or to test a punchplatform. It can read from tcp, udp, lumberjack and Kafka.

OPTIONS

  • -brokers <arg> :

    • the kafka broker where to read from.
  • -c <campaign.json> | -c <campaign1.json>,<campaign2.json> | -c <campaign directory> :

    • Run a single or several campaigns, or all campaigns found in the specified directory. If you run several campaigns, each will be run using a dedicated thread.
  • -check,--dump :

    • dump to stdout instead of injecting.
  • -cl | --lumberjack-client :

    • Starts running as a Lumberjack client. You must set the server host and port number using the host and port option. By default the lumberjack client will send 32 bytes strings. If you want to simulate real traffic you must define a configuration injection file.
  • -d | --delay <value_in_ms> :

    • overides injection file throughput or inter message delay, in milliseconds. This can be used to inject very low traffic rates.
  • -earliest :

    • start consuming kafka message from the earliest.
  • -h,--help :

    • print this message.
  • -H,--host <arg> :

    • overrides injection file destination host.
  • -k | --kafka-consumer :

    • Starts running as Kafka consumer. You must set the topic and broker options.
  • -latest :

    • start consuming kafka message from the latest.
  • -n | --number <message_number> :

    • Exits after that many messages have been sent.
  • -p | --port <rate> :

    • Set the (destination or listening) port number. This overload the rate defined in the configuration file, if any.
  • -punchlets,--punchlets <arg> :

    • stress a chain of punchlets (comma separated).
  • -resources,--resources <arg> :

    • add punchlet resources (comma separated).
  • -q,--silent :

    • reduce verbosity to error messages.
  • -sl | --lumberjack-server :

    • Starts running as a Lumberjack server. You must set the port number using the port option.
  • -st | --tcp-server :

    • Starts running as a plain TCP server. You must set the port option.
  • -stream :

    • Define storm stream for injected logs
  • -t | --throughput <rate> :

    • Define the traffic rate, possibly overloading the rate defined in the configuration file, if any.
  • --thread <thread-number> :

    • By default each injection is singled thread. To simulate several connections to the server, increase the number of threads. Each will take a part of the total throughput defined in your scenario.
  • -topic <arg> :

    • the kafka topic.
  • -ts,--tcp-server <arg> :

    • act as tcp server to count the number of received logs. You must set the port number.
  • -u,--udp:

    • use udp.
  • -us,--udp-server <arg>:

    • act as udp server to count the number of received logs. You must set the port number.
  • -v,--verbose :

    • prints out the read data. It only work with some sender or receiver.
  • -w,--connection-timeout <arg> :

    • defines maximum wait time in ms for the receiver port to be available (not in udp mode) - 0 (default value) means infinte wait. Also applies on reconnection after connection loss.
  • --sustain :

    • This option is relevant only for the lumberjack client. The client will send increasing traffic to the server, and will stop when the window of unacknowledged messages reaches 1000. This lets you easily check the bandwidth of your system.

Examples

Inject apache traffic . The destination and load characteristics is defined in an injection file.

1
$ punchplatform-log-injector.sh -c apache_injection.json

Idem but changing the rate to 1500 messages per seconds on stream [logs][log]:

1
$ punchplatform-log-injector.sh -c apache_injection.json --throughput 1500 --stream [logs][log]

Running a lumberjack server listening on tcp/lumberjack port 21212

1
$ punchplatform-log-injector.sh --lumberjack-server --port 21212

Sending lumberjack traffic to the server we just started

1
$ punchplatform-log-injector.sh -c lumberjack_injector.json -t 1000
1
-  You have to precise the protocol in the injector configuration file

adding lumberjack for that field. In you need an example just check in the $PUNCHPLATFORM_CONF_DIR/conf/resources/injector/examples/ repository Take care, you can't add a --punchlets parameter to the lumberjack, in that case it will only send the data to the punchlet without taking the protocol into account.

Checking the traffic received from a Kafka topic. Note here that the 'local' kafka broker must be defined. in your punchplatform.properties file. I.e. this option only works on an installed punchplatform node.

1
$ punchplatform-log-injector.sh --kafka-server --topic apache --brokers local -v

Test or stress a punchplets pipeline to see overall perfomance

1
$ punchplatform-log-injector.sh -c [injection_file].json --punchlets p1.punch,p2.punch,... --resources r1.json,r2.json,...

Configuration file

The injection file is a JSON file, in which you are free to add \'#\' prefixed comments.

The various sections are described below.

Destination Section

You can set in your injection file the data destination, i.e. where you want to send your generated data. This section is optional, you can define it using the command-line parameters. If you set one, you can also override it using online parameters. Here is an example to send your generated data to a TCP server.

1
2
3
4
5
6
7
8
{
    "destination" : { 
        "proto" : "tcp", 
        "host" : "127.0.0.1", 
        "port" : 9999 
    },
...
}

The supported destination are :

  • tcp: send the data to TCP server
  • udp: send the data to UDP server
  • lumberjack: send the data to UDP server
  • http: performs POST REST requests to an http server
  • stdout: just print out the generated data. Use for debugging purposes and copypasta.
  • kafka: act as a kafka producer, toward a given topic
  • elasticsearch: inject data directly to an Elasticsearch cluster.

Here are examples configurations for the \"destination\" section:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
# all these require plain host port parameters
{ "proto" : "tcp", "host" : "127.0.0.1", "port" : 9999 }
{ "proto" : "udp", "host" : "127.0.0.1", "port" : 9999 }
{ "proto" : "lumberjack", "host" : "127.0.0.1", "port" : 9999 }
{ "proto" : "http", "host" : "127.0.0.1", "port" : 9999, 
    "http_method": "POST", 
    "http_root_url": "/", 
    "bulk_size": 1 
}

# Elasticsearch configuration ('port' is optional, 'bulk_size' default is 1)
{ 
    "proto" : "elasticsearch", "host": "127.0.0.1", "port": 9300, 
    "cluster_name" : "es_search", 
    "index": "test", 
    "type": "doc", 
    "bulk_size": 1000 }

# Kafka only accepts a "brokers" name that must be defined in your 
# punchplatform.properties file. That is : this option only works
# (as of today) on an installed punchplatform.
{ 
    "proto": "kafka", 
    "brokers": "local", 
    "topic": "mytenant_bluecoat_proxysg"
}

Load Section

This section lets you control the injector\'s throughput. It is also optional if you prefer using online parameters.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
"load" :{

    # "message_throughput" indicates the number of message per second.
    # Sometimes you want to inject fewer message than 1 per second, 
    # you can then use the alternative property : "inter_message_delay" 
    # For example to inject one message every 30 seconds :
    #   "inter_message_delay" : 30
    # 
    "message_throughput" : 1000,

    # Optional : control how often you have a recap stdout message. 
    "stats_publish_interval" : "2s",

    # The total number of message. Use -1 for almost infinite (2³¹-1 messages). 
    "total_messages" : 1000000,

    # Optional : make you throughput fixed or variable. By default fixed.
    # Using "variable" makes your load vary between 50 and 150 % of your 
    # set throughput.
    "type" : "fixed"
}

Punchlets Performance Test

The injector is great to stress one or a chain of punchlet under a high load of data. Using the --punchlets argument you basically make a chain a punchlets traversed by tons of (reprensentative) data.

To check everything runs fine before stressing the punchlets, use the "-v" option to dump the prunchlet result Again the -t option is your friend here to do that slowly

1
$ punchplatform-log-injector.sh -c <json-injection-file> --punchlets punchlet1,punchlet2,.. -t 1 -v

If you need to include punchlet resources, use --resources option

1
2
3
4
$ punchplatform-log-injector.sh -c <json-injection-file> \
    --punchlets standard/common/input.punch,standard/common/parsing_syslog_header.punch,... \
    --resources standard/apache_httpd/taxonomy.json,standard/apache_httpd/http_codes.json \
    --dump
  • Note on punchlet performance : you should expect on a Intel Core i7 2,5GHz:
    • running the injection without doing nothing : 730 Keps
    • running the injection with the input tuple creation only : 670 Keps
    • running the injection with the punchlets : 30 Keps

Message Section

This mandatory section contains the payload sent by the log injector.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
"message" : {

    # the payloads are templates of what you inject. In there you 
    # can insert %{} variable fields that will be replaced by the 
    # corresponding element you define in the "fields"section 
    # described right next. For example here, %{src} will be replaced 
    # by the "src" field.
    #
    # You can define a single payload. Should you define several 
    # ones like illustrated here, the injector will simply round-robin 
    # on each one.
    #
    # Every time a message is generated, each %{} variable field is 
    # replaced by a new value.
    #
    # You can thus finely control what your output data will look like.
    #

    "payloads" : [
        "%{timestamp}: New session from IP %{src_ip} UUID %{uuid}.",
        "%{timestamp}: %{owner} visited URL %{url} %{nb_visits} times.",
        "%{timestamp}: %{owner} also uploaded %{outbytes}kb and downloaded %{inbytes}kb."
    ],

    # The fields sections lets you define various kind of generated 
    # values. In the following all the supported injector fields are 
    # described.

    "fields" : {

        "src_ip" : {
            # Generate IPV4 addresses. 
            "type" : "ipv4",

            # You use brackets to control what part of the address 
            # you want to make variable. Here all of them. 
            "format" : "[0-255].[0-255].[0-255].[0-255]"
        },
        "url" : {

            # Take the values from a list. Every time a value is 
            # generated you getn next element of your list.
            "type" : "list",

            # Here is your list. 
            "content" : [
                "GET /ref/index.html HTTP/1.1", 
                "GET /yet/another.html.css HTTP/1.1"
                ]
        },

        "owner" : {
            "type" : "list",
            "content" : ["frank", "bob", "alice", "ted", 
                            "dimi", "ced", "phil", "julien"]

            # This time we want to iterate differently. We want to 
            # send "frank" 3 times then "bob" 3 times and so on. 
            "update_every_loop": false,
            "update_every": 3
        },

        "uuid": {
            # Generate a valid unique string identifier
            "type": "session_id"
        }

        "nb_visits" : {
            "type" : "counter",
            "min" : 0,
            "max" : 12
        },
        "inbytes" : {
            "type" : "random",
            "min" : 1000,
            "max" : 30000
        },
        "outbytes" : {
            "type" : "gaussian",
            "mean": 200.0,
            "deviation" : 30.0,
            "mantissa_precision": 2,
            "always_positive": true
        },
        "timestamp" : {
            "type" : "timestamp",
            "format" :  "dd/MMM/yyyy:HH:mm Z",
            "start_time" : "2012.12.31",
            "start_time_format" : "yyyy.MM.dd",
            "tick_interval" : "1h"
        }
    }
}

In many case you want to send json payloads. You can use embedded Json to make it easier. An example explains it all:

1
2
3
4
5
6
7
8
"message" : {

    "payloads" : [
        { 
            "time" : "%{timestamp}", 
            "aNumber" : %{number} 
        }
    ],

Note

the resulting file is not a valid Json anymore because the %{number} would require to be enclosed by quotes. The log injectors will deal with it, but that suppose you generate a numerical or boolean value..

Here are the several supported templated types:

  • ipv4 : to generate ipv4 addresses
  • list : to loop over a set of items
  • counter : an iterating numeric value
  • random : a random numeric value following uniform probability density.
  • gaussian : a random value following a gaussian probability density.
  • session_id : Generates an UUID
  • timestamp : a timestamp, for which you fully control the format, the start time, and the tick interval.

Loop control

Whatever be the type you can control the value generation using the following optional parameter:

  • update_every_loop : boolean
    • control the way the field is updated, either each time or one out of update_every loop. Note that if set to false, the update_every parameter is mandatory.
    • default: true
  • update_every : int
    • the number of loop iterations before the generated value is changed.
    • default: 1

list

  • content : boolean
    • an array of values the injector will loop over.
    • example: [ 1, 2, 3 ], [ "hello", "world" ]

counter

  • min : the (inclusive) min value
  • max : the (inclusive) max value

random

  • min : the (exclusive) min value
  • max : the (exclusive) max value

gaussian

  • mean : int

    • the average value of the repartition.
    • default: 0
  • deviation : int

    • the standard deviation. Note: this means that 68% of the values will be contained in [mean]{.title-ref}+`deviation`
    • default: 1
  • mantissa_precision : int

    • number of digits after the comma. If set to 0, the comma char \'.\' itself is removed (integer).
    • default: 0
  • always_positive : boolean

    • only generate positive values. Note that the gaussian is cropped also in 2*AVERAGE to keep the mean value intact
    • default: true

Return codes


The punchplatform-log-injector utility exits 0 on success, and >0 if an error occurs.

Environment


The following environment variables affect the execution of punchplatform-log-injector.sh:

  • PUNCHPLATFORM_CONF_DIR
    • The PUNCHPLATFORM_CONF_DIR_CONFDIR environment variable indicate the directory where tenant and channel configuration files are stored. A 'tenants' subdirectory is expected. Below you will find a tenant then channel directory tree.

Bugs

No known bugs.