Skip to content

Fix index reincarnation routing bug#6217

Merged
nadav-govari merged 5 commits intomainfrom
nadav/index_uid
Mar 25, 2026
Merged

Fix index reincarnation routing bug#6217
nadav-govari merged 5 commits intomainfrom
nadav/index_uid

Conversation

@nadav-govari
Copy link
Collaborator

@nadav-govari nadav-govari commented Mar 25, 2026

Description

Node based routing introduced a bug - there was no index_uid tracking, and a new index_uid didnt clear old routing entries. As a result, deleting an index and recreating it with the same name, which is normally fine, would result in unavailability.

This fixes that by:

  1. clearing the routing entries for an index on a greater index incarnation
  2. Broadcasting 0 shards open on that node.
  3. deleting the key from chitchat after that, which removes it from the routing tables
  4. Eventually, once chitchat converges, 0 open nodes for the index
  5. If a new request comes in for that index, it should see the old incarnation of the index, but with 0 nodes, and call the control plane, which triggers a merge from shards and gives us what we want

Added a unit test and an integ test

@nadav-govari nadav-govari changed the title Nadav/index uid Fix index reincarnation routing bug Mar 25, 2026
};
let value = serde_json::to_string(&capacity)
.expect("`IngesterCapacityScore` should be JSON serializable");
self.cluster.set_self_key_value(key, value).await;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One cycle is too short. Let's just use set_self_key_value_delete_after_ttl and get rid of the pending_removal logic.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Much cleaner

Copy link
Member

@guilload guilload left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Logic is sound. Code is inelegant. The old routing table code also had useful comments:

match self.index_uid.cmp(index_uid) {
            // If we receive an update for a new incarnation of the index, then we clear the entry
            // and insert all the shards.
            std::cmp::Ordering::Less => {
                self.index_uid = index_uid.clone();
                self.clear_shards();
            }
            // If we receive an update for a previous incarnation of the index, then we ignore it.
            std::cmp::Ordering::Greater => {
                return;
            }
            std::cmp::Ordering::Equal => {}
        };

@nadav-govari nadav-govari merged commit e20f06c into main Mar 25, 2026
8 checks passed
@nadav-govari nadav-govari deleted the nadav/index_uid branch March 25, 2026 19:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants