perf: buffer accumulation in _write_query_params() reduces f.write() calls by mykaul · Pull Request #790 · scylladb/python-driver

mykaul · 2026-04-04T14:22:08Z

Summary

Replace per-parameter write_value(f, param) loops with buffer accumulation (list.append + b"".join + single f.write()), reducing f.write() calls from (2*N + 1) to 1 for N query parameters in the execute/query path.

This supersedes the closed PR #788 (inlining approach). Buffer accumulation is strictly superior: it achieves equal or better speedups in every scenario while producing a smaller, cleaner diff.

Motivation

Every CQL query/execute call serializes query parameters via write_value(f, param), which does 2 f.write() calls per parameter (length prefix + data). For queries with vector embeddings (128–1536 dimensions), this creates many small writes per message.

Buffer accumulation collects all bytes in a Python list and writes once, eliminating per-parameter function call overhead and reducing syscall-like overhead.

What changed

`cassandra/protocol.py` (2 hunks)

_QueryMessage._write_query_params() — Buffer accumulation for the parameter loop. Local variable caching (_int32_pack, _parts_append) for Cython-friendly tight loop.
ExecuteMessage._write_query_params() — Removed unnecessary super() pass-through override (now inherited directly from _QueryMessage).

`tests/unit/test_protocol.py`

Added 14 new test methods in WriteQueryParamsBufferAccumulationTest:

Normal, NULL, UNSET, mixed, empty bytes, empty list, None params
Large vector (768D), many params (50), cross-protocol (v3 vs v4)
Full encode_message round-trip through ProtocolHandler
Single NULL and single UNSET regression tests

`benchmarks/bench_execute_write_params.py` (new)

Standalone benchmark script for reproducibility.

Benchmark results

Environment: Python 3.14, Cython .so compiled, 500K iterations, best of 5 runs.

Scenario	Baseline (ns/call)	Buffer accum (ns/call)	Speedup
128D float32 vector (1 param)	794	634	1.25x
768D float32 vector (1 param)	858	766	1.12x
1536D float32 vector (1 param)	924	834	1.11x
10 text columns	1222	940	1.30x

Comparison with PR #788 (inlining)

Scenario	Inlining (PR #788)	Buffer accum (this PR)
128D vector	1.05x	1.25x
768D vector	1.06x	1.12x
1536D vector	1.05x	1.11x
10 text columns	1.20x	1.30x

Implementation notes

list.append + b"".join benchmarked faster than bytearray +=
Local variable caching avoids repeated attribute lookups in the hot loop
Fully backward-compatible — serialized bytes are identical (verified by 14 tests)
protocol.py is Cython-compiled; optimization benefits both pure Python and Cython paths

mykaul · 2026-04-04T14:40:00Z

Just spitting this here:
Is there any value in ~0.1us improvement?

Honest answer: on its own, ~100ns per call is tiny. But context matters:
Where it adds up:

High-throughput vector INSERT pipelines doing 100K+ ops/sec — that's ~10ms/sec of CPU saved per core
The write path is single-threaded per connection, so it's on the critical path
Multiply by batch sizes (BatchMessage calls write_value per param per query)
Where it doesn't matter:
Network round-trip to Scylla is typically 100-500us minimum
Serialization of the vector bytes themselves dwarfs the write_value overhead
Real workloads are rarely bottlenecked on parameter serialization
The real argument for merging is code quality, not the 100ns:
Removing the redundant super() pass-through is a clear cleanup
The buffer accumulation is a well-known pattern (fewer writes = better) and the code isn't harder to read
It establishes a pattern for future BatchMessage optimization where the savings would be larger (N queries × M params)
But if the question is "would a user ever notice this in production?" — no, almost certainly not. It's a micro-optimization. Whether it's worth the diff churn and review cost is a judgment call for the maintainers.

Replace the per-parameter write_value(f, param) loop in _QueryMessage._write_query_params() with a buffer accumulation approach: list.append + b"".join + single f.write(). This reduces the number of f.write() calls from 2*N+1 to 1, which is significant for vector workloads with large parameters. Also removes the redundant ExecuteMessage._write_query_params() pass-through override to avoid extra MRO lookup per call. Includes 14 unit tests covering normal, NULL, UNSET, empty, large vector, and mixed parameter scenarios for both ExecuteMessage and QueryMessage. Includes a benchmark script (benchmarks/bench_execute_write_params.py).

mykaul marked this pull request as draft April 4, 2026 14:23

mykaul force-pushed the perf/buffer-accum-write-params branch 2 times, most recently from bc1545f to 9b21d5b Compare April 4, 2026 14:36

mykaul force-pushed the perf/buffer-accum-write-params branch from 9b21d5b to f2be2a8 Compare April 4, 2026 14:47

mykaul force-pushed the perf/buffer-accum-write-params branch from f2be2a8 to ac64459 Compare April 4, 2026 15:02

mykaul mentioned this pull request Apr 4, 2026

perf: buffer accumulation in BatchMessage.send_body() #791

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: buffer accumulation in _write_query_params() reduces f.write() calls#790

perf: buffer accumulation in _write_query_params() reduces f.write() calls#790
mykaul wants to merge 1 commit intoscylladb:masterfrom
mykaul:perf/buffer-accum-write-params

mykaul commented Apr 4, 2026 •

edited

Loading

Uh oh!

mykaul commented Apr 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mykaul commented Apr 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Motivation

What changed

cassandra/protocol.py (2 hunks)

tests/unit/test_protocol.py

benchmarks/bench_execute_write_params.py (new)

Benchmark results

Comparison with PR #788 (inlining)

Implementation notes

Uh oh!

mykaul commented Apr 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

mykaul commented Apr 4, 2026 •

edited

Loading

`cassandra/protocol.py` (2 hunks)

`tests/unit/test_protocol.py`

`benchmarks/bench_execute_write_params.py` (new)