Skip to content

CRAIG 5.2.0

Below is a summary of the JIRA issues addressed in the CRAIG-5.2.0 release of Punchplatform. For full documentation of the release, a guide to get started, and information about the project, see the Punchplatform project site.

Note about upgrades: Please carefully review the upgrade documentation for this release thoroughly before upgrading your cluster. The upgrade notes discuss any critical information about incompatibilities and breaking changes, performance changes, and any other changes that might impact your production deployment of Punchplatform.

The documentation for the most recent release can be found at https://doc.punchplatform.com.

Release summary

This release brings major fixes to version 5.1.2. It also brings deployment improvements, simplify and optimize PML executions and provides a new punchctl command.

Important changes

Please read Migration Guide from 5.1.2 to 5.2.0

Release notes

Major changes

  • [PP-3188] - Upgrade channel_structure format
  • [PP-3096] - update PML default format
  • [PP-3153] - Rework Shiva Elasticsearch logs
  • [PP-3194] - Convert default ES index type to '_doc'
  • [PP-3199] - Change name for metricbeat, packetbeat, auditbeat, filebeat indices: platform--
  • [PP-3201] - Install and start Auditbeat on standalone

Main subject

  • [PP-1890] - CEPH for production
  • [PP-2010] - Archiving refactor (Configuration, indexation)
  • [PP-2510] - have a state of the art documentation system
  • [PP-3036] - Security Audit
  • [PP-3049] - punch should fully support binary data

New feature

  • [PP-2511] - start logstash from channel
  • [PP-2997] - Auditbeat: Support multiple paths, with recursion, and exclude files
  • [PP-3024] - Spark in deploy mode cluster - broadcasting of files
  • [PP-3081] - Package pyspark into the standalone
  • [PP-3120] - Support binaries in tuples
  • [PP-3154] - add token authentication feature for elasticsearch bolt
  • [PP-3171] - Kafka Bolt should expose "group by key" Kafka feature
  • [PP-3050] - add binary support to http spout
  • [PP-3051] - add a tryEpochSec and tryEpochMs operator

Improvement

  • [PP-1934] - review/finalise deployment of CephFS
  • [PP-2053] - Add spark settings into graph implementation
  • [PP-2652] - HOWTO deal with light topologies
  • [PP-2695] - Standardized Begin/End+Rc events in shiva-logs
  • [PP-2865] - create a kibana dashboard like the storm UI for light topologies
  • [PP-3017] - Elastic Common Schema
  • [PP-3019] - Evaluate code/product quality solutions
  • [PP-3037] - CERT IST vulnerability report
  • [PP-3038] - Vulnerability research on a "production" platform
  • [PP-3039] - Vulnerability research on embedded COTS
  • [PP-3052] - enrich punch language with geospatial goodies
  • [PP-3063] - Use storm 'extlib/' additional jars
  • [PP-3092] - make shiva launch a pml job or plan
  • [PP-3108] - simplify and improve java pml
  • [PP-3109] - Integrate PML nodes to use Keras models in pyspark
  • [PP-3155] - pml fail to start in foreground with kafka stream mode
  • [PP-3156] - Package PML, logstash & light-topo for Shiva
  • [PP-3197] - Use a dedicated jar to submit Storm topologies

Bug

  • [PP-2111] - 1 node data transport does not set up the environment properly when $PUNCHPLATFORM_CONF_DIR is already set
  • [PP-2123] - same doc re-deployment should not report "changed"
  • [PP-2191] - Storm exit without error printing
  • [PP-2263] - Shiva missing logs
  • [PP-2405] - log-injector "total_message" infinite does not work
  • [PP-2445] - unable to test grok pattern with the grok debugger when double quotes are in the logs
  • [PP-2504] - Brad-to-Craig migration page is incomplete (and sometimes misleading)
  • [PP-2512] - --configure file not found
  • [PP-2698] - punchplatform-service --start report STOPPED for running services
  • [PP-2743] - "No appenders for logger org.apache.http.client.protocol.RequestAddCookies" for punchplatform-channel
  • [PP-2744] - On big platforms, indice mapping for indice 'punchplatfom' must allow for more than 1000 fields
  • [PP-2752] - On centos, ssh port is mantory in git_bare section
  • [PP-2805] - Punchplatform--configure not working on standalone (missing jinja2)
  • [PP-2850] - PML components documentation not complete in produc doc
  • [PP-3010] - Syslog_spout(UDP) listen in TCP
  • [PP-3018] - punchplatform-analytics.sh job submit on a spark cluster (multiple machine) does not work
  • [PP-3022] - pp-analytics.sh, in client mode -> spark ui does not show job progression metrics
  • [PP-3025] - Fix PML file output node to take into account spark.files parameter
  • [PP-3028] - Wrong documentation for spark.cluster.slaves.listen_address
  • [PP-3029] - incorrect environment for centos Kibana plugin
  • [PP-3030] - kibana plugin should provide good PUNCHPLATFORM_CONF_DIR to subprocesses
  • [PP-3031] - cannot extract from plugin
  • [PP-3034] - FileBolt receives wrong numbers of batch
  • [PP-3041] - Placeholder '%{tenant}%' only work if the 'tenant' attribute is filled in the topology file
  • [PP-3057] - Bug supervisord on kibana-server roles on redhat
  • [PP-3064] - Wrong json resource in topology raises NullPointerException
  • [PP-3068] - Missing documentation for /etc/conf.json parameters
  • [PP-3070] - Missing log42j.properties stack trace upon chanel stop
  • [PP-3073] - Online puncher example "Parsing with grok" broken
  • [PP-3074] - Wrongly "mandatory" parameters
  • [PP-3077] - Broken elasticsearch housekeeping script
  • [PP-3078] - Geoip resource broken
  • [PP-3079] - Service "elasticsearch_housekeeping_service" does not work
  • [PP-3080] - local_timestamp broken in SyslogSpout
  • [PP-3089] - punchlet executed from the online puncher cannot have comments
  • [PP-3093] - File output node problem
  • [PP-3094] - Ensure pyspark is working with python3
  • [PP-3122] - DateOperator Daylight Saving Time Issue
  • [PP-3132] - wrong shiva-logs index
  • [PP-3160] - fix PySpark dist issue
  • [PP-3176] - Plugin deployment template
  • [PP-3180] - Tuple leafAsString() does not support List
  • [PP-3181] - bug commons-lib.sh when user group has a space
  • [PP-3182] - Need new storm parameter in deployer/roles and conf when the hostname isn't resolve
  • [PP-3183] - wrong right on storm appache-storm on RedHat
  • [PP-3184] - No jq on shiva nodes
  • [PP-3186] - typo on elasticsearch-server/templates/
  • [PP-3202] - On centos, Kafka and storm are not restarted if they failed
  • [PP-3204] - Platform monitoring of a 1-node with spark indicates spark slave and master as red (falsely)
  • [PP-3208] - smart tuple replay is not activated in craig
  • [PP-3210] - Shiva do not publish unassigned task
  • [PP-3214] - pml trakes for ever scanning types in big jar
  • [PP-3215] - channel configure error
  • [PP-3207] - dataset generator

Internal Task

  • [PP-1987] - Add documentation about adding a new Ceph pool when installing a new tenant
  • [PP-2208] - Implement a reindex capability on objects-storage data
  • [PP-2565] - Markdown et javadoc à jour pour syslog, lumberjack, punch, ES, kafka spout + bolts
  • [PP-2878] - Run native shell command from Shiva arg
  • [PP-2995] - Test PML Kmeans performance with apache logs
  • [PP-3015] - Spark deployer, nodes not binding to the right ip
  • [PP-3033] - Update Punchplatform.io
  • [PP-3035] - spark settings
  • [PP-3042] - PySpark - make elasticsearch batch output prototype
  • [PP-3046] - improve pyspark pml code
  • [PP-3047] - implement PML kafka output node in python
  • [PP-3053] - improve PML python elastic batch input node to match PML java
  • [PP-3054] - make python3 default version for pyspark
  • [PP-3055] - Python PML: implement python node
  • [PP-3060] - Review/update the SublimeText plugin
  • [PP-3065] - improve ip extraction punch language capabilities
  • [PP-3066] - websense parser automatic test fail
  • [PP-3088] - unsupported en_FR locale
  • [PP-3101] - added shared resources as part of the putconf --platform
  • [PP-3104] - elastic nodes must be better documented and improved
  • [PP-3105] - Make the Kibana plugin compatible with HJSON document jobs
  • [PP-3107] - Properly handle HJSON plan/jobs
  • [PP-3114] - Capitalize 'PunchPlatform_*' variables from pp-env.sh
  • [PP-3127] - Use checksum with pp-external downloads
  • [PP-3128] - File input node does not work in deploy mode cluster/client
  • [PP-3130] - Upgrade Storm to 1.2.2 (last release)
  • [PP-3131] - pp-analytics.sh bug
  • [PP-3133] - Big picture of the terminal UX for a new user
  • [PP-3135] - Indexed Archiving filesystem example for pp-resource
  • [PP-3137] - punchplatform-channel --status error
  • [PP-3138] - Set spark python driver to v3.6
  • [PP-3143] - Understand what we do when submitting a topology to Storm cluster
  • [PP-3144] - Interaction between Storm and Zookeeper
  • [PP-3149] - Shiva monitoring illegalArgumentException
  • [PP-3170] - ssl_certificate_verification elasticsearchbolt
  • [PP-3172] - Craig v5.2.0 pre-release tests
  • [PP-3179] - PML aggregation example (JAVA)
  • [PP-3209] - Review PML Academy example on craig-5.2.0

Testing

  • [PP-3205] - craig-5.2.0 release tests