tests: fix load balancing policy tests for Scylla Raft topology#779
Draft
mykaul wants to merge 4 commits intoscylladb:masterfrom
Draft
tests: fix load balancing policy tests for Scylla Raft topology#779mykaul wants to merge 4 commits intoscylladb:masterfrom
mykaul wants to merge 4 commits intoscylladb:masterfrom
Conversation
There was a problem hiding this comment.
Pull request overview
Fixes integration load balancing policy tests to be compatible with modern Raft-enabled Scylla behavior, where certain topology changes are rejected if any nodes are down, and shard-aware routing can distribute token-aware traffic across multiple replicas.
Changes:
- Reorders node start/stop vs. decommission/bootstrap operations in RoundRobinPolicy tests to avoid Raft topology-change rejection when a node is dead.
- Updates
test_token_aware_with_rf_2to accept shard-aware routing distributing requests across both replicas (while still ensuring all requests go to replicas only).
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Scylla's Raft topology coordinator rejects decommission when there are dead nodes in the cluster. Restart node 3 before decommissioning node 1.
Scylla's Raft topology coordinator rejects bootstrap when there are dead nodes in the cluster. Bootstrap node 5 before force-stopping node 1.
Scylla's Raft topology coordinator rejects bootstrap when there are dead nodes in the cluster. Bootstrap node 5 before force-stopping node 1.
Scylla's shard-aware routing may distribute TokenAwarePolicy queries across both replicas instead of always picking the first one. Assert that the total query count across both replicas equals 12.
17d215f to
69611fa
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Four integration tests in
tests/integration/long/test_loadbalancingpolicies.pyfail against modern Scylla (>= 2026.1, and likely earlier Raft-enabled versions). This PR fixes all four, each in a separate commit.Problem
Commits 1–3: Raft rejects topology changes with dead nodes
Scylla's Raft topology coordinator rejects
decommissionandbootstrapoperations when there are dead (unreachable) nodes in the cluster. Three tests trigger this by callingforce_stop()on a node and then immediately attempting a topology change while that node is still down:test_roundrobin:force_stop(3)→decommission(1)— decommission fails because node 3 is dead.test_roundrobin_two_dcs:force_stop(1)→bootstrap(5, 'dc3')— bootstrap fails because node 1 is dead.test_roundrobin_two_dcs_2:force_stop(1)→bootstrap(5, 'dc1')— bootstrap fails because node 1 is dead.Fix: Reorder the operations so the topology change (decommission/bootstrap) happens before the node is killed, or the dead node is restarted before the topology change:
test_roundrobin: Restart node 3 (start(3)+wait_for_up) before decommissioning node 1.test_roundrobin_two_dcs: Bootstrap node 5 (bootstrap(5, 'dc3')+wait_for_up) before force-stopping node 1.test_roundrobin_two_dcs_2: Bootstrap node 5 (bootstrap(5, 'dc1')+wait_for_up) before force-stopping node 1.The test semantics are preserved — the same nodes end up alive/dead/decommissioned, and the same query distribution assertions hold.
Commit 4: Shard-aware routing distributes across replicas
test_token_aware_with_rf_2hardcodes the expectation that all 12TokenAwarePolicyqueries with RF=2 go to a single node (node 2). With Scylla's shard-aware routing, queries may be distributed across both replicas (e.g.,{node2: 5, node3: 7}), since the driver can route to the specific shard owning the token on either replica.Fix: Instead of asserting
node2 == 12, node3 == 0, assert thatnode1 == 0(not a replica) andnode2 + node3 == 12(all queries go to replicas). The second assertion block (after stopping node 2) remains unchanged — with only one replica alive, all 12 queries correctly go to node 3.Test results
Full suite: 16 passed, 1 skipped (the skipped test is
test_token_aware_with_transient_replication, gated on Cassandra 4.0+).Tested with: Scylla
release:2026.1, Python 3.14,EVENT_LOOP_MANAGER=asyncio,PROTOCOL_VERSION=4.