Skip to content

MON Training - Track3: Investigating the channel problems (performance/capacity, errors)

Once you have eliminated the infrastucture/framework incidents, a problem is most frequently: some data is missing at the end (not arriving, arriving late)

Where does the problem arise in my service chain ?

  • Have a look at the custom dashboard (do I see the first stage that is not nominal): backlog, failures, non-matching rates, low uptime

  • Have a look at the generic dashboards (Are there failures, Are there restarts, what is application health, are there errors in shiva logs)

  • Have a look at the log of the most upstream failing task (bottom-up investigation)

  • Be careful of low control: a problem on a lumberjack receiver punchline can manifest as failures on the forwarder punchline