Roadmap

This document provides the high-level PunchPlatform roadmap. Features listed here are macro features, planned for specification, design and development. Because the PunchPlatform is a programmable Platform, deployed in different projects, client requested features can be accepted along the way and change the proposed roadmap.

Warning

Some features are planned for delivery for a given quarter (Q1,Q2,Q3,Q4). Many will be continuously delivered over several quarters. These are indicated using a Qstart-Qend notation.

Refer to the https://confluence.agora-t.net/display/PunchPlatform/PunchPlatform+Home space for details.

2017 - 2018

Quick View

2017                       2018
 ------------------------Q3------------------------Q1------------------------Q2------------------------Q3
 < - Micro PunchPlatform ->
 < - PML : Spark Sdk - - - - - - - - - - - - - - - >
 < - Spark/Hadoop Elasticsearch  - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - >
                           < - PunchTower Cloud Deployer - - - - - - - - - - ->

Features

  • (Q3) Micro PunchPlatform : We are working on downsizing the key PunchPlatform components and functions so as to run small configurations, typically (but not limited to) LTRs. The goal is to achieve excellent performance using only a few Mb or RAMs, yet benefiting from all the key platform services. The micro PunchPlatform will leverage two important new features :
    • lightweight topologies : we implemented a small footprint storm execution engine so as to execute the user topologies using minimal resources. This is especially important for running LTRs. This new execution engine will be fully compatible with the PunchPlatform topology configuration files. No change will be required from existing systems to benefit from this new engine.
    • Shiva : a lightweight distributed task manager. Shiva makes it possible to schedule for execution arbitrary components such as Elastic beats or lightweight PunchPlatform topologies. Using shiva, no need to deploy storm clusters and services on LTR or LMR like systems.
  • (now - Q1) : PML is the PunchPlatform Machine Learning stack. It is fully based and fully compatible with the Spark ML Pipeline and Spark Structured Streaming stack. The idea is to let our users design spark pipelines and schedule them for execution using configuration files, so as to speedup the deployment tuning and time-to-production of machine learning functions.
    • check the documentation for details about the approach.
    • this is the key PunchPlatform feature, itself made up of several sub features: - PunchPlatform plan scheduler : to let user define their execution plan. - PunchPlatform feature computation libraries : to expose convenient functions directly under the native spark API. - PunchPlatform sdk to let everyone add their own function to the self.
  • (now - Q3) : Spark/Hadoop Elasticsearch integration. This is the runtime counterpart of PML. Our user will be offered with a state-of-the-art yet production ready Spark runtime to process their Elasticsearch data. - focus is on integrated supervision, automated deployment and ease of use. - data transfer to/from external stacks will be supported.
  • Q1 : PunchTower Cloud Deployer. The PunchPlatform deployer has been proven simple to use and versatile. Our users deploy arbitrary variants of punchplatforms in a matter of a few hours or days. We will now start working on integrating our (ansible based) deployment modules as part of the existing and widely used deployment platform such as Horton Ambari, Nifi, or Pivotal stacks. This in turn will allow the deployment of all or selected PunchPlatform components as part of existing user infrastructures and platforms.

2016 - 2017

Quick View

2016                       2017
 ------------------------Q3------------------------Q1------------------------Q2------------------------Q3
 < - OPENJDK - >
 < - CEP - - - - - - - - >
 < - DEPLOYMENTGUIDES - >
 < - SPARK - - - - - -  >
 < -  - - - - - PunchTower Deployer and Supervisor - - - - - - - - - - - - - - - - - - - - - - - - - - - >
 < -  - - - - - Ceph Distributed Storage - - - - - - - - - - - - - - - >
                         < - - - - - - Archiving - - - - - - - - - - - - - >
                         < - - - - - - Ciphering - >
                         < - - - - - - Historian - - - - - - - - - - - - - - - - - - - - - - - - - - - >
                         < - - - - - - SparkStreaming Support -  - - - - - - - - - - - - - - - - - - - >
                         < - - - - - - Spark Support - - - - - - - - - - - - - - - - - - - - - - - - - >
                         < - - - Kibana Plugins for extraction and reporting - - >
                         < - - - - - - MachineLearning - - - - - - - - - - - - -  - - - - - - - - - - - >
  < -  - - - - - CyberSecurity Parsing/Normalization - - - - - - - - - - - - - - - - - - - - - - - - -  >
                                                     < -  - - - - - Alerting & Correlation - - - - - - ->

Features

  • (Q1) (CEPH) Leverage Ceph distributed storage
  • (Q1-Q3) PunchTower Deployer and Supervisor
    • Modular deployment of selected PunchPlatform components
    • Integrated supervision using Telegraf and Beats
    • Spark/Cassandra/OpenTsdb
    • Elasticsearch-hadoop and Cassandra-hadoop connectors
  • (Q1-Q2) (Archiving) archiving service on top of CEPH and external SANs
    • The archiving service let users save and extract peta-bytes of data (in particular logs) in a resilient and secured storage
    • The CEPH storage provides a distributed, cost-effective solution.
  • (Q1-Q3) Historian support through a new Cassandra backend
    • An OpenTsdb/Cassandra backend is integrated to provide optimal support for time series/scada data
    • Grafana is used to visualize the data in real-time
  • (Q1-Q2) Ciphering : On the fly Data ciphering
  • (Q1-Q3) Spark support
    • The support for Spark jobs running on top of Elasticsearch will allow large datasets to be efficiently processed by arbitrary algorithms
    • Running Machine Learning calibration and learning algorithm are the key target use cases
    • Spark Machine Learning libraries will be integrated.
  • (Q1-Q3) Spark Streaming support
    • Spark Streaming job will be supported and plugable as part of PunchPlatform channels.
    • APIs and connectors will be offered to third party developer
    • Ready to use streaming algorithms will be offered and fully integrated
  • (Q1-Q3) Integration of Machine Learning, including algorithms from Theresis and Centai
  • (Q1-Q3) Kibana plugins for log extraction and generation of reports
  • (Q2-Q3) Alerting & Correlation
    • Ready to use configuration for CyberSecurity or Supervision Alerting and correlation rules

2016 Features

  • (Q3) (OPENJDK) Migrating from Oracle JDK to OpenJDK 8
  • (Q3) (SPARK) Complete Third-Party spark-streaming integration in standalone
  • (Q3) (DEPLOYMENTGUIDES) Deployment Guides for : mono-server, cluster and LTR.
  • (Q3) (CEP) ComplexEventProcessing Storm Integration