Skip to content

Troubleshooting Ceph insertion fails

Why do that

You have configured an archiving topology (or channel) writing on Ceph, you receive logs in your punchline but you do not see them on your archiving system. You want to investigate it and fix it.

What to do

  1. Check Ceph cluster status. On a PunchPlatform administration station, run following command :

    ceph -c /etc/ceph/MyClusterName.conf
    

    If an error occurs, check your cluster name (usually main) and existence of configuration file (MyClusterName.conf) and administration key (/etc/ceph/MyClusterName.client.admin.keyring). Check files rights.

    If command succeeded, you now have a Ceph shell. Run following command:

    status
    
  2. If you obtain an HEALTH_OK or HEALTH_WARN, your Ceph cluster is fine, you can follow next steps. If you obtain anything else, stop this procedure and fix your Ceph cluster according to other Troubleshooting procedures.

  3. Check on Storm UI there are tuples acknowledged by transactional KafkaInput (mandatory in an archiving topology). If not, no logs have been fetched from it. So check your KafkaInput configuration (topic name, cluster name, ...) to resolve it.
  4. If there are tuples acknowledged by transactional KafkaInput but not by FileBolt, check if component errors appear (in Storm UI) and try to resolve them (according to its complexity).
  5. If no error appears, you need to access dedicated Storm worker logs. Find on Storm UI the machine running the topology, connect it through ssh and see worker logs (/var/log/punchplatform/storm/workers/YourWorker). * Check these lines appear in logs:
    An irrecoverable stack overflow has occurred. Please check if any of your loaded .so files has enabled executable stack (see man page execstack(8))

    A fatal error has been detected by the Java Runtime Environment:

    SIGSEGV (0xb) at pc=0x00007f25c786ca8d, pid=21376, tid=0x00007f26a0f26700

    JRE version: OpenJDK Runtime Environment (8.0_131-b11) (build 1.8.0_131-8u131-b11-2ubuntu1.16.04.3-b11) Java VM: OpenJDK 64-Bit Server VM (25.131-b11 mixed mode linux-amd64 compressed oops) Problematic frame: C [libceph-common.so.0+0x55aa8d] Option::Option(std::__cxx11::basic_string, std::allocator > const&, Option::type_t, Option::level_t)+0xfd

    Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again

    An error report file with more information is saved as: /data/storm/workers/4cd002ac-e4c5-4deb-a730-698d869b9f06/hs_err_pid21376.log

    If you would like to submit a bug report, please visit: http://bugreport.java.com/bugreport/crash.jsp The crash happened outside the Java Virtual Machine in native code. See problematic frame for where to report the bug.

    If it appear, it means that JVM options of your achiving topology
    are wrong. In *storm\_settings.topology.worker.childopts* field of
    topology configuration file you might have something like:

If it appears, it means that JVM options of your archiving topology are wrong. In storm_settings.topology.worker.childopts field of topology configuration file you might have something like:

    -server -Xms300m -Xmx300m -Xss512k -XX:CompressedClassSpaceSize=120m -XX:MaxMetaspaceSize=120m -XX:ReservedCodeCacheSize=20m -XX:+UseParallelGC -XX:+UseParallelOldGC -XX:ParallelGCThreads=2 -XX:MaxDirectMemorySize=64m -XX:+HeapDumpOnOutOfMemoryError -XX:+DisableExplicitGC -Dfile.encoding=UTF-8 -XX:+AlwaysPreTouch

Please replace it (adapting values) by:

    -server -Xms300m -Xmx300m

Finally, restart your archiving topology (or channel) and check logs are successfully written on your archiving system (Ceph).