tests: fix load balancing policy tests for Scylla Raft topology by mykaul · Pull Request #779 · scylladb/python-driver

mykaul · 2026-03-31T20:05:07Z

Summary

Four integration tests in tests/integration/long/test_loadbalancingpolicies.py fail against modern Scylla (>= 2026.1, and likely earlier Raft-enabled versions). This PR fixes all four, each in a separate commit.

Problem

Commits 1–3: Raft rejects topology changes with dead nodes

Scylla's Raft topology coordinator rejects decommission and bootstrap operations when there are dead (unreachable) nodes in the cluster. Three tests trigger this by calling force_stop() on a node and then immediately attempting a topology change while that node is still down:

test_roundrobin: force_stop(3) → decommission(1) — decommission fails because node 3 is dead.
test_roundrobin_two_dcs: force_stop(1) → bootstrap(5, 'dc3') — bootstrap fails because node 1 is dead.
test_roundrobin_two_dcs_2: force_stop(1) → bootstrap(5, 'dc1') — bootstrap fails because node 1 is dead.

Fix: Reorder the operations so the topology change (decommission/bootstrap) happens before the node is killed, or the dead node is restarted before the topology change:

test_roundrobin: Restart node 3 (start(3) + wait_for_up) before decommissioning node 1.
test_roundrobin_two_dcs: Bootstrap node 5 (bootstrap(5, 'dc3') + wait_for_up) before force-stopping node 1.
test_roundrobin_two_dcs_2: Bootstrap node 5 (bootstrap(5, 'dc1') + wait_for_up) before force-stopping node 1.

The test semantics are preserved — the same nodes end up alive/dead/decommissioned, and the same query distribution assertions hold.

Commit 4: Shard-aware routing distributes across replicas

test_token_aware_with_rf_2 hardcodes the expectation that all 12 TokenAwarePolicy queries with RF=2 go to a single node (node 2). With Scylla's shard-aware routing, queries may be distributed across both replicas (e.g., {node2: 5, node3: 7}), since the driver can route to the specific shard owning the token on either replica.

Fix: Instead of asserting node2 == 12, node3 == 0, assert that node1 == 0 (not a replica) and node2 + node3 == 12 (all queries go to replicas). The second assertion block (after stopping node 2) remains unchanged — with only one replica alive, all 12 queries correctly go to node 3.

Test results

Full suite: 16 passed, 1 skipped (the skipped test is test_token_aware_with_transient_replication, gated on Cassandra 4.0+).

tests/...::test_black_list_with_host_filter_policy PASSED
tests/...::test_dc_aware_roundrobin_one_remote_host PASSED
tests/...::test_dc_aware_roundrobin_two_dcs PASSED
tests/...::test_dc_aware_roundrobin_two_dcs_2 PASSED
tests/...::test_roundrobin PASSED
tests/...::test_roundrobin_two_dcs PASSED
tests/...::test_roundrobin_two_dcs_2 PASSED
tests/...::test_token_aware PASSED
tests/...::test_token_aware_composite_key PASSED
tests/...::test_token_aware_is_used_by_default PASSED
tests/...::test_token_aware_prepared PASSED
tests/...::test_token_aware_with_local_table PASSED
tests/...::test_token_aware_with_rf_2 PASSED
tests/...::test_token_aware_with_shuffle_rf2 PASSED
tests/...::test_token_aware_with_shuffle_rf3 PASSED
tests/...::test_token_aware_with_transient_replication SKIPPED
tests/...::test_white_list PASSED
============ 16 passed, 1 skipped =============

Tested with: Scylla release:2026.1, Python 3.14, EVENT_LOOP_MANAGER=asyncio, PROTOCOL_VERSION=4.

Copilot

Pull request overview

Fixes integration load balancing policy tests to be compatible with modern Raft-enabled Scylla behavior, where certain topology changes are rejected if any nodes are down, and shard-aware routing can distribute token-aware traffic across multiple replicas.

Changes:

Reorders node start/stop vs. decommission/bootstrap operations in RoundRobinPolicy tests to avoid Raft topology-change rejection when a node is dead.
Updates test_token_aware_with_rf_2 to accept shard-aware routing distributing requests across both replicas (while still ensuring all requests go to replicas only).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Scylla's Raft topology coordinator rejects decommission when there are dead nodes in the cluster. Restart node 3 before decommissioning node 1.

Scylla's Raft topology coordinator rejects bootstrap when there are dead nodes in the cluster. Bootstrap node 5 before force-stopping node 1.

Scylla's shard-aware routing may distribute TokenAwarePolicy queries across both replicas instead of always picking the first one. Assert that the total query count across both replicas equals 12.

mykaul requested a review from Copilot March 31, 2026 20:06

Copilot started reviewing on behalf of mykaul March 31, 2026 20:07 View session

Copilot AI reviewed Mar 31, 2026

View reviewed changes

mykaul added 4 commits April 3, 2026 23:11

tests: fix test_roundrobin decommission with dead node

3965951

Scylla's Raft topology coordinator rejects decommission when there are dead nodes in the cluster. Restart node 3 before decommissioning node 1.

tests: fix test_roundrobin_two_dcs bootstrap with dead node

a883948

Scylla's Raft topology coordinator rejects bootstrap when there are dead nodes in the cluster. Bootstrap node 5 before force-stopping node 1.

tests: fix test_roundrobin_two_dcs_2 bootstrap with dead node

d652713

Scylla's Raft topology coordinator rejects bootstrap when there are dead nodes in the cluster. Bootstrap node 5 before force-stopping node 1.

tests: fix test_token_aware_with_rf_2 for Scylla shard-aware routing

69611fa

Scylla's shard-aware routing may distribute TokenAwarePolicy queries across both replicas instead of always picking the first one. Assert that the total query count across both replicas equals 12.

mykaul force-pushed the fix/test-roundrobin-decommission-dead-node branch from 17d215f to 69611fa Compare April 3, 2026 20:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tests: fix load balancing policy tests for Scylla Raft topology#779

tests: fix load balancing policy tests for Scylla Raft topology#779
mykaul wants to merge 4 commits intoscylladb:masterfrom
mykaul:fix/test-roundrobin-decommission-dead-node

mykaul commented Mar 31, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

mykaul commented Mar 31, 2026

Summary

Problem

Commits 1–3: Raft rejects topology changes with dead nodes

Commit 4: Shard-aware routing distributes across replicas

Test results

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants