Skip to content

Troubleshooting indexes unavailabilty in an elasticsearch cluster

Why do that

Because either the PunchPlatform has reported a health condition for an online index or Elasticsearch REST API has directly been used and reported a RED health for an elasticsearch index.

No data from this index is available from the Elasticsearch API, which probably means you have a whole day of unavailable data for the corresponding tenant in Kibana.

What to do

Use the same approach as for under-replication (see TROUBLESHOOTING_Under-replication_in_an_Elasticsearch_cluster) to determine which nodes have failed.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
    If failed nodes cannot be brought back in the cluster (and there is 
        **NO WAY** to make it return back), the best is to destroy the index 
        from the ES REST API and perform a reinjection from another sink :

    :   -   **Kafka** : So far the best solution if lost index is
            recent and if data is still present in its retention,
            -   see
                [HOWTO_replay_logs_from_kafka_from_a_specific_date](HOWTO_replay_logs_from_kafka_from_a_specific_date.md){.interpreted-text
                role="ref"};
        -   **Object Storage solution like CEPH or File System
            archiving** : The best long-term choice if you opted for
            archiving. See the Archiving guide to perform
            Object-to-ES reinjection.
        -   **Elasticsearch** : Good alternative if it happens that
            you have several ES clusters. Either perform a full ES
            extraction and reinjection topology ; or, if the index
            is strictly identical, copy the index snapshot in the
            cluster.
        -   **PunchPlatform\'s Ahmad Elasticsearch scheme**: Same as
            above, but more specifically with these clusters:
            -   If we are in « Month » cluster, restore data from
                archive built by « year » cluster
            -   If we are in « Year » cluster, then on the next day,
                export archive of today by snapshoting data from the
                « Week » cluster , and store the archive in
                « frozendata » space
            -   If we are in « Week » cluster, decide if needed to
                reimport missing index by using archive built by
                « year » cluster