Before You Start¶
How It Works¶
To deploy a punchplatform you need a deployer laptop or server and the target servers where you want to deploy your platform. The starting situation is illustrated next.
The punchplatform-deployer.sh
tool is a punchplatform software tool delivered as part of the
installation package. All you have to provide is a description of your target platform. I.e. what component you want
to deploy on what server.
Running that tool is easy and fully automated. What is not automated and what is key for you to have is a clear idea of your design. Because the punchplatform (and its deployer) are extremely modular, you have a wide range of options from deploying a small one-node system up to a full-fledged giant clustered platform.
In short: make sure you go through the rest of this chapter before you start deploying. It will help you.
Configuration Management¶
To start with it is important to understand the punch configuration logic, and how users interact with the patform. The deployer helps you to create the working environment for the platform operators and user. An operator typically acts upon the platform through a few terminal command line utilities. It does that from a well identified admnistration server where these commands have been installed. This is depicted next where the yellow server is the one where the operator environment has been deployed:
These commands are of two kinds. The first two (punchplatform-getconf.sh
and puchplatform-putconf.sh
)
let him save or load the per-tenant configuration folder.The other two let the user
start or stop channels and services.
To store the configuration, the punch relies on zookeeper. ZooKeeper is a centralized service for maintaining configuration information. Loading or saving the configuration is as simple as executing getconf and putconf:
At the very start, somebody has to create a new configuration tree for the tenant. Here is how it typically works: after the punch is deployed, there is no configuration defined. I.e. the deployer is not in charge of creting a tenant configuration. The platform is up and ready, but empty. It looks like this:
Somebody must therefore create the first configuration. This is typically a post-deployment operation. It can be done from the deployer machine as illustrated next:
The operator is then ready to go and use its platform.
Note
As you can see, it cannot be simpler. The punch design driver is to make deployment and operation as straightforward as possible.
Methodology¶
Deploying a complete PunchPlatform requires four steps.
Design¶
The PunchPlatform can be architectured in many different ways, to serve different purposes and to take into account various networking and security issues. The point is : do not start deploying a PunchPlatform if you are unclear on your architecture :
- where do logs/data come from, do you need virtual IPs, load balancing, failover of some of the components ?
- will your platform scale ?
- are you sure you have enough kafka, elasticsearch or ceph storage capacity to fulfill you SLAs ?
- etc ..
The PunchPlatform stack has been designed from day one to make it extra easy to deploy a production setup. This is a significant advantage as it drastically reduces the costs of your project build. This said, part of the success will come from you. If you read carefully the deployment documentation, and understand your architecture and components, you can deploy a complete system in a few hours. It will takes days or month if your requirements (OSes, networking, VMs etc ..) are not met. Juts like any large scale distributed application.
The good news is that the training material of the PunchPlatform is quite rich and easy to work with on any laptop, VM or workbench. Our recommendation for building a platform is the following.
Let us assume that you have system administrator skills, and that (i) you are completely new to the PunchPlatform and (ii) you are asked to deploy a distributed multi-tenant, multi-site setup with a PetaByte of storage.
How do you do ? Here is our recommendation:
Day 1: deploy a standalone PunchPlatform on your laptop, and follow the 2 and 5 minutes tour of the Getting started guide. This is quick, fun, and you end up with a LTR LMC combination with every functionality at hand. Take some time to see how you operate and run the platform, starting stopping channels, having a look at the various UIs, understanding the key concepts.
Day 2: deploy a production setup, again on your laptop. Do it for a LTR, and for a mono server LMC. Check the PunchPlatform confluence space, there we have blogs to explain how we do that (daily) using vagrant or docker, or on our native linux or macos laptop. In case you do not have such a laptop, our recommendation is : buy one ! If and only if you cannot have one, go to the Thales Cloud, Amazon or OVH and use VMs.
After this second step you will understand what a production deployment is about, how the various components are monitored and supervised, how you can decide in what folder/partition the data is, where the logs are, what version is installed etc.. You will also understand how to write the two key configuration files that describe a platform.
Day 3: your are all set ! Check that your architect/project leader gave you clear and concise instructions. Make sure you have the required target architecture, the hardware and/or servers, networking, firewalls. Do not go further before making sure that all that is up and running. We have great tooling for you to check all that.
Then and only then, start deploying your project PunchPlatform.
Select the Right Package¶
First, identify the right deployer package and download it from the punch website in the download area section. You have the following packages available:
Use Case¶
DataTransport¶
A ready-to-use package to run a data collector and forwarder node. It receives data on syslog tcp sockets, and forwards it using the lumberjack protocol to another punchplatform
This package has only basic processing capabilities. It is meants to collect the data on remote sites and forward it to a (typical) central punchplatform where you run your processing.
DataManagement¶
A ready-to-use package to receive, process, index your data and save it locally.
This is a typical ELK-like all-in-one package. A good starting point to setup a a log management server use case.
DataAnalytics¶
This package is similar to the DATA MANAGEMENT but adds some advnced machine learning capabilities.
Single or Multi Nodes¶
You have two packaging options:
- 1-Node package : This option is the easiest to start with. The packages are auto configured to run all components and service on a single server. If you need to scale to more servers, start from the 1-Node and adapt its configuration.
- Deployer package : this package lets you define your platform on your own. We recommend you use this only with the punch expert service assistance.
Tip
These packages are meant to accelerate your deployment. For more details, do not hesitate contacting us at contact@punchplatform.com. We also we provide trainings to help your integrators during the installation.
You are ready to install and setup the deployer environment. Refer to the Deployer setup guide.
Install Prerequisites¶
Your have your Punchplatform up and ready. What you must do next is to create your channels, according to your business specific use cases.
Deployer Installation Guide¶
Requirements
The punchplatform deployer is supported on Macos, Ubuntu 16.04 or later, Centos 7 or Redhat 7 (not tested) operating systems.
We recommend:
- 2 CPU
- 4Gb Memory
- 30 Gb Additional Storage
In the rest of this paragraph we describe the required setup for each supported deployer type.
Ubuntu¶
Execute the following packages installation:
1 2 3 4 5 6 7 8 | sudo apt install \ unzip \ curl \ git \ jq \ python \ python-pip \ sshpass |
Install ansible:
1 2 | # ansible 2.3.0 sudo pip install ansible==2.3.0 |
If you encounter problem setting up this required version of ansible from your available repositories, you will find an offline setup tool in the deployment package of punchplatform :
1 2 3 4 5 | unzip punchplatform-deployer-x.y.z.zip cd deployment_dependencies unzip ansible-2.3.0-pippackages.zip cd ansible-2.3.0-pippackages sudo ./install.sh |
Last, only if you need to deploy a CEPH cluster, install the ceph packages on the deployer machine:
1 2 | # if ubuntu 18 else use ceph-13 provided
sudo apt install ceph
|
the reason to install the CPEH packages on the deployer is because some of the deployment steps requires CEPH tools
Centos / RedHat¶
Execute the following packages installation:
1 2 3 4 5 6 7 8 9 10 | sudo yum install \ vim \ wget \ unzip \ curl \ git \ jq \ python \ sshpass \ python-pip |
CentOS: No package jq available.
You have to enable the EPEL repo, use:
yum --enablerepo=extras install epel-release
This command will install the correct EPEL repository for the CentOS version you are running.
After this you will be able to install python-pip.
Install ansible: (If you do not have internet access or local pip repository, use other variant below)
1 2 | # ansible 2.3.0 sudo pip install ansible==2.3.0 |
If you encounter problem with the previous command, set up this specific required version of ansible from your available repositories, you will find an offline setup tool in the deployment package of punchplatform :
1 2 3 4 5 | unzip punchplatform-deployer-x.y.z.zip cd deployment_dependencies unzip ansible-2.3.0-pippackages.zip cd ansible-2.3.0-pippackages sudo ./install.sh |
Last, perform the following actions:
1 2 3 4 5 6 7 8 9 | # disable firewalld on all devices sudo systemctl disable firewalld sudo systemctl stop firewalld vi /etc/sysconfig/selinux # change the following line : # SELINUX=enforcing # by # SELINUX=disabled # and restart the machine |
MacOS¶
Macos works as deployer server except that it does not allow you to deploy a CEPH cluster. Be careful to deploy the 2.3.0 ansible version as explained below.
First install Xcode. Then install the following packages:
1 2 3 | sudo easy_install pip sudo pip install ansible==2.3.0 brew install core-utils |
Additional Package Installation
Download one of the deployer package provided by the punchplatform team or punchplatform.com
We recommend to move it into a large storage partition (for instance: /data):
1 2 3 4 | wget <link> sudo mkdir -p /data sudo chown $(user) /data unzip punchplatform-<package>-<version> -d /data |
If you plan to deploy a CEPH cluster you need additional steps. First download the external archives corresponding to your deployer:
Next move it to the punchplatform-<package>-<version>/archives
directory and rename it ceph_<version>_deb.tgz
.
and install ceph archives on your deployment server
1 2 3 | tar -xvf ceph_<version>_deb.tgz sudo yum install -y lttng-ust sudo yum install -y ceph<version>/*.rpm |
Remember deploying CEPH requires a Redhat or Centos deployer.
Deployer Environment Setup¶
Update your PATH so as to have the punchplatform-deployer.sh
available :
Tips
we recommend to get the punchplatform-1nodedatamanegement to begin with punch deployment !
Ubuntu / Centos / Redhat¶
1 2 | cd punchplatform-<package>-<version> echo "export PATH=`pwd`/bin:$PATH" >> ~/.bashrc |
MacOS¶
1 2 | cd punchplatform-<package>-<version> echo "export PATH=`pwd`/bin:$PATH" >> ~/.bash_profile |
Do not forget to reload your .bashrc
to take this environment update into account in your terminal! (either re-login or source your ~∕.bashrc)!
Deployer Configuration directory¶
Next, create your platform configuration directory.
This directory will hold the description of your target platform with the punchplatform.properties
and the punchplatform-deployment.settings files.
Ubuntu/Centos/Redhat:¶
1 2 3 4 5 6 7 8 | cd ~ mkdir pp-deployment-conf cd pp-deployment-conf echo "export PUNCHPLATFORM_CONF_DIR=`pwd`" >> ~/.bashrc cd .. mkdir pp-deployment-logs cd pp-deployment-logs echo "export PUNCHPLATFORM_LOG_DIR=`pwd`" >> ~/.bashrc |
Macos:¶
1 2 3 4 5 6 7 8 | cd ~ mkdir pp-deployment-conf cd pp-deployment-conf echo "export PUNCHPLATFORM_CONF_DIR=`pwd`" >> ~/.bash_profile * cd .. mkdir pp-deployment-logs cd pp-deployment-logs echo "export PUNCHPLATFORM_LOG_DIR=`pwd`" >> ~/.bash_profile |
Do not forget to reload your .bashrc
to take this environment update into account in your terminal! (either re-login or source your ~∕.bashrc)!
1 2 | # Ubuntu/Centos/Redhat source ~/.bashrc |
1 2 | # Macos source ~/.bash_profile |
Check it worked as expected. The result of the env
command must look like:
1 2 3 4 5 | env | grep PUNCH PUNCHPLATFORM_LOG_DIR=/Users/dimi/pp-deployment-logs PUNCHPLATFORM_CONF_DIR=/Users/dimi/pp-deployment-conf echo $PATH PATH=/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin:/Users/dimi/bin:/data/deployer/punchplatform-deployer-5.1.0/bin |
Target Servers Installation Guide¶
Requirements
The punchplatform is only supported on Ubuntu 16.04 or later, Centos 7 or Redhat 7 operating systems.
The choice of these operating systems is the result of extensive testing, running and tuning platforms on production systems for years. Leveraging these OSes, you will benefit from a great deal of feedback to tune and debug common issues.
The target servers are going to be installed with binaries and configurations to run
the punchplatform components. The punchplatform deployer tool internally uses ansible
to
do that.
In this chapter we list the requirements and checks you must ensure before deploying.
Infrastructure Prerequisites
Make sure:
- All network interfaces are up and configured.
- Storage partitions are writable and mounted except for Ceph intended block device to manage.
Ceph specific requirements:
You must prevent the updatedb process (standard on Debian-like distributions) to scan the whole system, especially to scan the Ceph data partition or the punchplatform partition. You can do that in several ways
Use the Ansible playbook provided in the official Punchplatform deployer to automatically patch the configuration
file on multiple nodes.
This playbook is in updatedb_patch
directory, at deployer root directory. Its use is documented on playbook itself.
1 2 3 4 5 | # add your ceph nodes in the inventory vim inventory_updatedb_patch.inv # apply playbook ansible-playbook -i inventory_updatedb_patch.inv updatedb_patch.yml |
Manual process
You can manually patch the
/etc/updatedb.conf
configuration file, adding/var/lib/ceph
toPRUNEPATHS
values on Ceph nodes.You can manually patch the
/etc/updatedb.conf
configuration file, adding/data
to toPRUNEPATHS
values on all servers.Example of a
/etc/updatedb.conf
must contains:PRUNEPATHS="/var/lib/ceph" "/data"
line.
Note
Preventing updatedb to scan the whole system is necessary on a server exposing many files (typically the situation on a Ceph server), as the updatedb internal database can quickly and dramatically grow up.
Additional Elasticsearch Prerequisites
The execution prerequisites are disclosed in Elasticsearch public documentation; the current section highlights some specificities that are often not identified during server/OS setup and may cause deployment problems
-
the /tmp partition must not be mounted with 'noexec' options, otherwise Elasticsearch will fail its bootstrap checks with an error message in its log file :
system call filters failed to install
-
Elasticsearch by default requires that its process memory may be locked to prevent swapping. The target servers (or virtual servers) must be able to deliver this feature (i.e. no specific hardening) should prevent Elasticsearch from requesting memory lock from the kernel. Elasticsearch checks this at startup time during its 'bootstrap checks' and fail if no/not enough memory could be locked.
System Prerequisites
- Administration access: An administration account must be provided and access with SSH from the installation environment to the servers. This account must be sudoers to update systemctl configuration for instance.
- Naming resolution: Naming resolution must be configured (short and long hostname are resolved. When resolving from any target machine the hostname of itself or any other target machine (i.e. the return of 'hostname' command), the result must be the production network interface (as opposed to any other administration or supervision/monitoring dedicated network interface).
- Time Synchronisation: A full time synchronisation infrastructure like NTP must be configured and running.
- Repository: Standard repositories of the chosen operating system must be provided. To test the correct configuration, we recommend to update all servers before the punch deployment. Internet access is ok, but private repository too. For centos deployment, 'epel' standard repository must be enable and available.
- system language: en_US.UTF-8
1 2 3 4 5 | localectl | grep LANG System Locale: LANG=en_US.utf8 # If this is not the case, change it : sudo localectl set-locale LANG=en_US.utf8 |
The following packages and configuration are mandatory on all targer servers:
1 2 3 4 5 6 7 8 9 | # Ubuntu sudo apt install python # Centos/Redhat sudo yum install python # Centos only: disable firewalld sudo systemctl disable firewalld sudo systemctl stop firewalld |
Deploy the platform¶
You must now precisely define your platform by selecting the component you need. This is easy if you work with a ready-to-use package as all the choices have been made for you.
In case you selected the full deployer package it is your task to create and fill two files:
We strongly suggest you try first the tutorials, they will help you progress step by step:
- Level 1: Getting started by deploying a zookeeper cluster
- Level 2: enrich the stack with apache storm
- Level 3: enrich the stack with an operator environment
These pages may also interest you: