Troubleshooting elasticsearch indexes unavailability
Why do that
Because either the PunchPlatform has reported a health condition
for an online index or Elasticsearch REST API has directly been used and
reported a RED health for an elasticsearch index.
No data from this index is available from the Elasticsearch API, which
probably means you have a whole day of unavailable data for the
corresponding tenant in Kibana.
What to do
Use the same approach as for under-replication (see TROUBLESHOOTING_Under-replication_in_an_Elasticsearch_cluster
) to determine which nodes have failed.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29 | If failed nodes cannot be brought back in the cluster (and there is
**NO WAY** to make it return back), the best is to destroy the index
from the ES REST API and perform a reinjection from another sink :
: - **Kafka** : So far the best solution if lost index is
recent and if data is still present in its retention,
- see
[HOWTO_replay_logs_from_kafka_from_a_specific_date](HOWTO_replay_logs_from_kafka_from_a_specific_date.md){.interpreted-text
role="ref"};
- **Object Storage solution like CEPH or File System
archiving** : The best long-term choice if you opted for
archiving. See the Archiving guide to perform
Object-to-ES reinjection.
- **Elasticsearch** : Good alternative if it happens that
you have several ES clusters. Either perform a full ES
extraction and reinjection topology ; or, if the index
is strictly identical, copy the index snapshot in the
cluster.
- **PunchPlatform\'s Ahmad Elasticsearch scheme**: Same as
above, but more specifically with these clusters:
- If we are in « Month » cluster, restore data from
archive built by « year » cluster
- If we are in « Year » cluster, then on the next day,
export archive of today by snapshoting data from the
« Week » cluster , and store the archive in
« frozendata » space
- If we are in « Week » cluster, decide if needed to
reimport missing index by using archive built by
« year » cluster
|