Skip to content

Troubleshooting Automatic ceph service reloading

Symptoms

The punchplatform storage cluster (Ceph) is unavailable sometimes. This issue happens recently in the morning, close to 6 AM.

In the system logs, other services are reloading after and/or before the Ceph error.

Examples of logs:

May 28 06:25:02 server6 systemd[1]: Reloading LSB: Apache2 web server.
May 28 06:25:02 server6 apache2[21527]:  * Reloading Apache httpd web server apache2
May 28 06:25:02 server6 apache2[21527]:  *
May 28 06:25:02 server6 systemd[1]: Reloaded LSB: Apache2 web server.
May 28 06:25:02 server6 ceph-mon[1790]: 2018-05-28 06:25:02.419028 7fac930dd700 -1 received  signal: Hangup from  PID: 21546 task name: killall -q -1 ceph-mon ceph-mgr ceph-mds ceph-osd ceph-fuse radosgw  UID: 0
May 28 06:25:02 server6 ceph-mgr[6394]: 2018-05-28 06:25:02.419759 7ff2ce737700 -1 received  signal: Hangup from  PID: 21546 task name: killall -q -1 ceph-mon ceph-mgr ceph-mds ceph-osd ceph-fuse radosgw  UID: 0
May 28 06:25:02 server6 ceph-osd[7600]: 2018-05-28 06:25:02.421308 7f107347b700 -1 received  signal: Hangup from  PID: 21546 task name: killall -q -1 ceph-mon ceph-mgr ceph-mds ceph-osd ceph-fuse radosgw  UID: 0

Cause / Explanations

From the release Ubuntu 16.04 (operating system), a daemon is run daily to perform an upgrade of the system. This including a reloading of processes.

In most cases, this behaviour is safe because it allows a strong level of security by patching security breaches. However, restarting a daemon in charge of storage managing is very harmful (for instance ceph).

What to do

We recommend disabling the automatic update of Ubuntu. To perform this action, update the file /etc/apt/apt.conf.d/10periodic :

sudo vim /etc/apt/apt.conf.d/10periodic
# replace
APT::Periodic::Update-Package-Lists "1";
# by 
APT::Periodic::Update-Package-Lists "0";