Skip to content

PunchPlatform versus ELK

Abstract

The PunchPlatform can be used as an ELK (Elasticsearch-Logstah-Kibana) solution. It is a very common (and a very good) question to kown what exactly is the difference. This guide goes through a simple example to highlight the differences and similarities between the two platforms.

Logstash versus Topology versus Channel

Using logstash you usually do something like this :

image

You start a logstash process, in charge of reading some data, transform it and index it into Elasticsearch. Easy enough. The equivalent concept used in the PunchPlatform is a topology. It looks like this.

image

It is exactly the same idea but with a different processing engine. Instead of Logstash, you rely on Apache Storm. It is as easy but so much more powerful ! Now you get the real power of the PunchPlatform by combining several topologies into a channel. It starts looking like this.

image

You can do that as well as using Logstash. But don't ! You will spend a lot of time deploying, configuring and supervising many Logstash servers. By using the PunchPlatform, that is already taken care of. You only need to write a JSON file to define your channel.

Important

When you prototype or discover Elasticsearch, do not hesitate to start with topologies. These are easy to work with. To submit them to the Storm cluster, you only have to make them defined as part of a channel. Submitting a channel goes through a real Storm job submission process. You will then benefit from the all the Storm goodies : in particular visualizing your topology metrics through he Storm UI.

Yahoo Stock Example

We will use the example described in http://blog.webkid.io/visualize-datasets-with-elk. A simple dataset is taken from Yahoo's historical stock database. You can download the data as a simple CSV file. It looks like this:

1
2
3
4
5
6
2015-04-02,125.03,125.56,124.19,125.32,32120700,125.32  
2015-04-01,124.82,125.12,123.10,124.25,40359200,124.25  
2015-03-31,126.09,126.49,124.36,124.43,41852400,124.43  
2015-03-30,124.05,126.40,124.00,126.37,46906700,126.37  
2015-03-27,124.57,124.70,122.91,123.25,39395000,123.25  
2015-03-26,122.76,124.88,122.60,124.24,47388100,124.24  

The point is simply to ingest that data into Elasticsearch, then visualize it using Kibana.

The only significant difference between ELK and the PunchPlatform is about designing your input-processing-output channel. Using ELK you use a logstash configuration file, that combines it all in a single file.

Using the PunchPlatform you use a [topology] configuration file and a punchlet. The topology file defines the input-processing-output structure. It can be as simple as a straight sequence file-punchlet-Elasticsearch (as in this tutorial), to arbitrarily complex and distributed combinations.

As for the punchlet it contains the data transformation logic.

Tip

True, you have two files instead of one. The reason of that is to clearly separate the processing logic (Parsing, enriching normalizing the data) from the deployment and configuration logic. It actually is a lot cleaner. You write your punchlet once, and reuse it next in many different input/filter/output pipes.

Let us check the csv-processing.punch punchlet you need for this example :

1
2
3
4
5
6
7
{
    csv("Date","Open","High","Low","Close","Volume","Adj Close")
        .delim(",")
        .inferTypes()
        .on([logs][log])
        .into([logs][log]);
}

Hopefully you guess what it does. It converts the CSV data into a clean JSON document ready to be inserted into Elasticsearch. What probably looks strange to you are the [logs][log] parts.

An example will best make that clear to you. The input data received by your punchlet looks like this:

1
2
3
4
5
{
  "logs": {
    "log": "2017-02-21,136.229996,136.75,135.979996,136.699997,24265100,136.699997"
  }
}

After the punchlet it looks like this :

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
{
  "logs": {
    "log": {
      "High": 137.119995,
      "Low": 136.110001,
      "Volume": 20745300,
      "Adj Close": 137.110001,
      "Close": 137.110001,
      "Date": "2017-02-22",
      "Open": 136.429993
    }
  }
}

The punchlet has simply replaced the input field [logs][log] by a new one consisting of key value pairs.

Compared to the Logstash filter syntax, the Punch Language is a real programming language, more compact and a lot more expressive than configuration files.

Next, you need to setup a topology configuration file to read this CSV file, run the punchlet and send the result to Elasticsearch. Let\'s take a look on what we get:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
{
    "name" : "file_injector",
    "spouts" : [
        {
            "type" : "file_spout",
            "spout_settings" : {
                "path" : "/tmp/csv-table-file.csv"
            },
            "storm_settings" : {
                "component" : "file_spout",
                "publish" : [ 
                  {  "stream" : "logs" , "fields" : ["log"] } 
                ] 
            }
        }
    ],
    "bolts" : [
        {
            "type" : "punch_bolt",
            "bolt_settings" : {
                "punchlet" : "examples/csv-processing.punch"
            },
            "storm_settings" : {
                "component" : "punch_bolt",
                "publish" : [ { "stream" : "logs", "fields" : ["log"] } ],
                "subscribe" : [
                    {
                        "component" : "file_spout",
                        "stream" : "logs",
                        "grouping": "localOrShuffle"
                    }
                ]
            }
        },
        {
          "type": "elasticsearch_bolt",
          "bolt_settings": {
            "cluster_id": "es_search",
            "per_stream_settings" : [
              {
                "stream" : "logs",
                "index" : { "type" : "daily" , "prefix" : "stock-" },
                "document_json_field" : "log"
              }
            ]
          },
          "storm_settings": {
            "component": "elasticsearch_bolt",
            "subscribe": [ 
              { "component": "punch_bolt", "stream": "logs", "grouping" : "localOrShuffle" } 
            ]
          }
        }

    ],
    "storm_settings" : {
        ...
    }
}

Note

Instead of using Logstash concepts (input, filters and output), you use Apache Storm concepts. Spouts are input functions, Bolts are both the processing and the output functions. Then, you describe your execution graph through this topology file.

To run this example, go to $PUNCHPLATFORM_CONF_DIR/examples folder. You can run the topology using the following command.

1
2
$ cd $PUNCHPLATFORM_CONF_DIR/examples/topologies/files/csv_to_stdout_and_elasticsearch
$ punchplatform-topology.sh csv_to_stdout_and_elasticsearch.json

To go further, keep reading the original blog post and have fun visualizing your data !

Where to go from there

As you see, logstash and topologies are similar concepts. Both are pipelines. Where the punch clearly differentiates from ELK is to provide you with state-of-the-art pipeline technologies. Parsing logs is one thing, you are likely to need much more power : machine learning, aggregations, massive extractions, data replay, etc..

There is the difference.

Conclusion

We tried to make the Punch as simple as ELK to ingest data, try it out, prototype a solution. One of our customers designed its complete supervision solution in a just a week.

This said the learning curve to use the Punch is slightly steeper, because you have to be at ease with Storm concepts. But once comfortable, your way to production, performance and resiliency will be dramatically shorter.