Add LWTRetryPolicy: retry CAS timeouts on same host with backoff by mykaul · Pull Request #783 · scylladb/python-driver

mykaul · 2026-04-01T09:13:02Z

Summary

LWT queries use Paxos consensus where the first replica (Paxos coordinator/leader) drives the consensus rounds. When a CAS write times out, retrying on a different host causes Paxos contention — the new coordinator must compete with the original, potentially causing cascading timeouts across the cluster.

Currently, no built-in retry policy retries CAS write timeouts at all — they are all RETHROWN immediately:

RetryPolicy.on_write_timeout: CAS → RETHROW
ExponentialBackoffRetryPolicy.on_write_timeout: CAS → RETHROW
DowngradingConsistencyRetryPolicy.on_write_timeout: CAS → RETHROW

This PR adds LWTRetryPolicy, a new retry policy that extends ExponentialBackoffRetryPolicy with LWT-aware behavior:

Scenario	Decision	Rationale
CAS write timeout	RETRY same host + backoff	Stay on Paxos coordinator to avoid contention
Serial read timeout	RETRY same host + backoff	CAS read at serial CL, same coordinator logic
Serial unavailable	RETRY next host + backoff	Paxos quorum lost on this node, try another
Non-CAS operations	Delegate to parent	Standard `ExponentialBackoffRetryPolicy` behavior

This is modeled after gocql's LWTRetryPolicy interface, which retries LWT queries on the same host to avoid Paxos contention. The key comment from gocql (line 188):

"Retrying on a different host is fine for normal (non-LWT) queries, but in case of LWTs it will cause Paxos contention and possibly even timeouts if other clients send statements touching the same partition to the same time."

Usage

from cassandra.cluster import Cluster
from cassandra.policies import LWTRetryPolicy

# Use as the default retry policy
cluster = Cluster(default_retry_policy=LWTRetryPolicy(max_num_retries=3))

# Or assign to a specific statement
statement.retry_policy = LWTRetryPolicy(max_num_retries=5)

Changes

cassandra/policies.py: Added LWTRetryPolicy class (extends ExponentialBackoffRetryPolicy)
tests/unit/test_policies.py: Added LWTRetryPolicyTest with 21 tests

Tests

21 new tests covering:

CAS write timeout retries on same host with backoff
Backoff delay increases with retry attempts
Max retries exceeded → RETHROW
Consistency level preserved across retries
Non-CAS writes delegate to parent (SIMPLE→RETHROW, BATCH_LOG→RETRY, COUNTER→RETHROW)
Serial read timeout retries on same host (SERIAL and LOCAL_SERIAL)
Serial unavailable retries on next host
Non-serial operations delegate to parent policy
Request errors inherit parent behavior
Constructor defaults and customization
All methods return proper 3-tuples

All 103 tests in tests/unit/test_policies.py pass.

LWT queries use Paxos consensus where the coordinator is the Paxos leader. Retrying on a different host causes Paxos contention — the new coordinator must compete with the original one, potentially causing cascading timeouts. LWTRetryPolicy (extends ExponentialBackoffRetryPolicy) handles this by: - CAS write timeouts: retry on SAME host with exponential backoff - Serial consistency read timeouts: retry on SAME host with backoff - Serial consistency unavailable: retry on NEXT host (paxos quorum lost) - Non-CAS operations: delegate to base ExponentialBackoffRetryPolicy Modeled after gocql's LWTRetryPolicy interface.

mykaul mentioned this pull request Apr 1, 2026

Fix SimpleStatement.is_lwt(): detect LWT from CQL query string #784

Draft

mykaul marked this pull request as draft April 1, 2026 12:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add LWTRetryPolicy: retry CAS timeouts on same host with backoff#783

Add LWTRetryPolicy: retry CAS timeouts on same host with backoff#783
mykaul wants to merge 1 commit intoscylladb:masterfrom
mykaul:feature/lwt-retry-policy

mykaul commented Apr 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mykaul commented Apr 1, 2026

Summary

Usage

Changes

Tests

Related

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant