MON Training - Track3: Investigating the channel problems (performance/capacity, errors)¶
Once you have eliminated the infrastucture/framework incidents, a problem is most frequently: some data is missing at the end (not arriving, arriving late)
Where does the problem arise in my service chain ?¶
-
Have a look at the custom dashboard (do I see the first stage that is not nominal): backlog, failures, non-matching rates, low uptime
-
Have a look at the generic dashboards (Are there failures, Are there restarts, what is application health, are there errors in shiva logs)
-
Have a look at the log of the most upstream failing task (bottom-up investigation)
-
Be careful of low control: a problem on a lumberjack receiver punchline can manifest as failures on the forwarder punchline