Troubleshooting Automatic ceph service reloading¶
Symptoms¶
The punchplatform storage cluster (Ceph) is unavailable sometimes. This issue happens recently in the morning, close to 6 AM.
In the system logs, other services are reloading after and/or before the Ceph error.
Examples of logs:
May 28 06:25:02 server6 systemd[1]: Reloading LSB: Apache2 web server.
May 28 06:25:02 server6 apache2[21527]: * Reloading Apache httpd web server apache2
May 28 06:25:02 server6 apache2[21527]: *
May 28 06:25:02 server6 systemd[1]: Reloaded LSB: Apache2 web server.
May 28 06:25:02 server6 ceph-mon[1790]: 2018-05-28 06:25:02.419028 7fac930dd700 -1 received signal: Hangup from PID: 21546 task name: killall -q -1 ceph-mon ceph-mgr ceph-mds ceph-osd ceph-fuse radosgw UID: 0
May 28 06:25:02 server6 ceph-mgr[6394]: 2018-05-28 06:25:02.419759 7ff2ce737700 -1 received signal: Hangup from PID: 21546 task name: killall -q -1 ceph-mon ceph-mgr ceph-mds ceph-osd ceph-fuse radosgw UID: 0
May 28 06:25:02 server6 ceph-osd[7600]: 2018-05-28 06:25:02.421308 7f107347b700 -1 received signal: Hangup from PID: 21546 task name: killall -q -1 ceph-mon ceph-mgr ceph-mds ceph-osd ceph-fuse radosgw UID: 0
Cause / Explanations¶
From the release Ubuntu 16.04 (operating system), a daemon is run daily to perform an upgrade of the system. This including a reloading of processes.
In most cases, this behaviour is safe because it allows a strong level of security by patching security breaches. However, restarting a daemon in charge of storage managing is very harmful (for instance ceph).
What to do¶
We recommend disabling the automatic update of Ubuntu. To perform this action, update the file /etc/apt/apt.conf.d/10periodic :
sudo vim /etc/apt/apt.conf.d/10periodic
# replace
APT::Periodic::Update-Package-Lists "1";
# by
APT::Periodic::Update-Package-Lists "0";