What Is It ?

Read this chapter first to go straight to the point : what is the Punchplaform ? How valuable is it for your use case ?

Warning

This chapter assumes you are familiar (to some extent) with the open source big data ecosystem : Kafka, Elasticsearch, ELK, Storm, Spark. If not, read the Overview first. Then come back here after.

The PunchPlaform is an Elastic stack ready to go to production. It shares with Elastic the following characteristics:

  • It offers a Logstash-style programming language :
    • grok, kv, csv, JSON support in-built in the language
  • Easy to install, easy to work with
    • everything is available on a Thales GitLab, Internet worldwide.
    • Compile, run, test it yourself : we hide nothing.
    • The standalone (i.e. mono server) version runs fine on Macos, Ubuntu, Centos. With no virtualization tricks.
    • Run you production configuration on your PC !
  • It is open :
    • you data is in Elasticsearch, use it the way you want and need
    • Receive or forward data from/to any sort of applications (including ELK).
  • It supports the Lumberjack protocol:
    • the PunchPlatform is compatible with the Elasticsearch ecosystem, in particular beats.

However there are quite a few differences with ELK : features, industrialization, and professional services.

Features

  • Configure, run, manage a complete, end-to-end functional data pipeline
    • this is called a channel. It can be distributed over one or several site.
    • it is in charge of collecting, transporting, processing and ultimately indexing your data in Elasticsearch
    • the channel is a simple but powerful abstraction. That is what you will operate.
    • It is fully described in a simple configuration file.
  • Multi-Tenant and secured
    • everything you setup and run in the PunchPlatform is multitenant. The PunchPlatform is designed from scratch to let you run mutualized platforms, hence reducing your costs.
    • Kibana/Grafana dashboards are exported to end-users in a way to let them view safely their data and only their data
  • Data Parsing is run in Storm, not in Logstash.
    • This brings you resiliency, scalability, acknowledgement.
  • Plug In Arbitrary Processing
    • The PunchPlatform lets you run Storm or Spark processing, not just Logstash filters.
    • Stream and batch processings are available
    • A Complex Event Processing engine is also available.
  • Design your data pipelines using the Storm concepts:
    • acyclic directed graphs. Plug in your functions.
    • Filter out uninteresting data
    • Route data to the left or to the right depending on any characteristics (data originator, data content, the load, the current weather …)
    • Generate new data (alarms, events, whatever)
    • Benefit from (distributed) exception handling : no way to loose your data, you’ll know where your processing got wrong and will be able to replay it once fixed.
  • The Punch programming language is a real language, compact and expressive.
    • 2 to 5 time more compact than Logstash configuration files.
    • has efficient lookup operators (geoip like, patricia trie, interval trees, etc..)
  • It comes with a bunch of standard Log parsers written by soc experts:
    • that includes Parsing, normalization, enrichment, and it is fully documented.
    • These parsers cover most firewalls and system logs you need to build a soc platform.
    • Besides, it’s easy to add more, we provide a parser SDK.
  • It comes with an injector tool:
    • Crunch in any kind of data : logs, Http, JSON, XML.
    • Injectors (JSON) files are provided for all standard parsers.
    • Design your client dashboards with real data in minutes.
  • End-to-end supervision is built-in :
    • PunchPlatforms are deployed together with a supervision (sub) platform
      • Elasticsearch is used as backend
      • Metrics are computed in real time in all software components
      • Monitoring agents are automatically deployed to collect metrics of all servers, network equipments and so on
      • Grafana is used to monitor it all
    • Setup (in minutes) Kibana and Grafana dashboards to visualize
      • the load of your CPUs,
      • the size of your Kafka queues (per topic per partition),
      • the data round trip time of your data, etc ..
      • your business indicators
    • A REST api exposes the supervision status of everything important
      • third-party supervisors can watch out easily the global health of your system.
    • All that is provided across several hops of processings
  • The PunchPlatform can be equipped with a CEPH distributed storage solution
    • Archive your data, big or small
    • Choose the level of replication and resiliency
    • It is of course automatically deployed and supervised
  • An archiving service is available
    • archive logs for years
    • Extract and/or replay part of it easily and efficiently
    • Ciphering and compression are available and performed on the fly
    • Choose your backend. The CEPH PunchPlatform storage is of course a great choice.
  • It helps you (a lot) with configuration:
    • Git is used to hold all the platform system and user configuration files
    • Templating is used to generate all the fine-grain configuration files from simpler, user-friendly templates
    • Each platform is deployed with a central configuration management console.
  • Kibana plugins are provided for end-user extraction
  • Big Data Analytics API
    • Connectors and APIs are provided to let you easily plugin your SparkStreaming processing on the data stream
    • Spark-based anomaly detection algorithms are provided, deploy them using simple configuration files:
      • Custom algorithms based on Spark MLib libraries
      • Algorithm integrated from the apache spot projects

Important

Altogether, these features makes the PunchPlatform significantly different from an ELK stack. Should you go production on your own with ELK, you will reimplement some of them. Think twice about it : none is simple.

Industrialization

Reducing the industrialization costs is key. Building, running and maintaining a big data platform must stay in control. This is actually the main PunchPlatform design driver.

The PunchPlatform comes with a deployer. What it does is to let you, user, describe your complete setup in a configuration file. From there our tool takes care of everything.

Note

internally the deployer relies on ansible. This is however transparent to you. The PunchPlatform deployer provides all the ansible recipies and roles to automatically deploy all the required software on your target cluster. Ansible inventories (i.e. the actual ansible files listing your servers, tagged with the right roles, your network interfaces and so on) are generated from your description file. In turn these inventories are saved and managed in the platform Git-powered configuration.

Once installed, the deployer let you safely update or upgrade your platform. No way to miss a configuration file update on one of your server ! Weather it is updated with a single minor patch, upgraded with a complete new major version, or upgraded with more servers/more hardware, your platform always stays in control : everything is safely saved and managed for you in the Git-powered platform configuration manager.

Last, once running, having your platform fully described in ansible inventories gives you great power for analyzing, checking, auditing your platform.

Important

these features are probably what makes the PunchPlatform most valuable. They dramatically reduce the costs to build and run a Big Data platform. Do not go production without something equivalent. In this space there are many tools/alternative and offers on the market. What makes the PunchPlatform unique is to rely on a complete yet extremely light architecture. No need to start with a heavy software (hadoop/mesos/etc…) infrastructure. You can install a super light mono server, a light three-servers setups, or bigger backend platforms with tenth of servers the same way. Even for small setups, all our components are scalable and resilient. So will be your platform.

Professional Services

The features and tools just described are fully documented. You can go on your own. In case you need assistance the PunchPlatform team provides valuable services:

  • Trainings
    • Building, running, programming a platform. We have trainings materials for all aspects
  • Architecturing
    • Give us your use cases, we will provide you with
      • Hardware and Software recommendation and configurations
      • Functional advises
  • The PunchPlatform software and team can operate your major platform upgrade, without service interruption
    • Keep your production up-to-date with the latest version of Elasticsearch, Kibana, Storm, Spark, Kafka etc..
      • Performing this without service interruption requires a careful plan, and (typically) back-office testing
      • the PunchPlatform team can test your exact configuration on a pre-production platform
      • once ready the effective update can be performed assisted by one of our expert
  • Log/Data Parser
    • Writing your parser on your own is easy, but unlikely what you want to do. You will end up with you own parsers to maintain.
    • Instead ask us the ones you need, if not already provided by the platform.
    • We code them for you, they will be supported on all subsequent platform release.
    • Benefit in turn from the community parsers
  • Support
    • Benefit from support, bug fixes

Contact us for details: dimitri.tombroff@thalesgroup.com, claire.bazin@thalesgroup.com