Skip to content

fix: drop stale ReadyForQuery expectation when server enters COPY IN mode#885

Open
NikolayS wants to merge 1 commit intopgdogdev:mainfrom
NikolayS:claude/fix-copy-in-readyforquery-ZpVXL
Open

fix: drop stale ReadyForQuery expectation when server enters COPY IN mode#885
NikolayS wants to merge 1 commit intopgdogdev:mainfrom
NikolayS:claude/fix-copy-in-readyforquery-ZpVXL

Conversation

@NikolayS
Copy link
Copy Markdown
Contributor

@NikolayS NikolayS commented Apr 10, 2026

Bug

When a client sends a COPY FROM STDIN statement via extended query protocol (Parse+Bind+Execute+Sync), pgdog adds a ReadyForQuery expectation to its protocol state queue for that Sync. But PostgreSQL ignores Sync during COPY IN mode (protocol spec §55.2.6) and never sends ReadyForQuery for it. The stale entry stays in the queue, done() never returns true, and the connection is never returned to the pool.

tokio-postgres (the most popular Rust PostgreSQL client) uses exactly this pattern — it sends COPY via extended protocol: Parse+Bind+Execute+Sync, then CopyData..., then CopyDone+Sync. PostgreSQL ignores the first Sync (it's already in COPY IN mode by then), producing only one ReadyForQuery instead of the two that pgdog expects.

Exact message sequence that causes desync

# Client → pgdog (handle side):
handle(Parse)     → add('1')                    queue: [ParseComplete]
handle(Bind)      → add('2')                    queue: [ParseComplete, BindComplete]
handle(Execute)   → add(ExecutionCompleted)      queue: [ParseComplete, BindComplete, ExecutionCompleted]
handle(Sync)      → add('Z')                    queue: [ParseComplete, BindComplete, ExecutionCompleted, RFQ]

# PostgreSQL → pgdog (forward side):
forward('1')      → pops ParseComplete           queue: [BindComplete, ExecutionCompleted, RFQ]
forward('2')      → pops BindComplete             queue: [ExecutionCompleted, RFQ]
forward('G')      → 'G'→Copy, pops ExecutionCompleted
                    (not RFQ, no push-back)       queue: [RFQ]
                    prepend('G')→Copy             queue: [Copy, RFQ]  ← STALE

# Client → pgdog (copy data):
handle(CopyDone)  → action('c')→Copy, pops Copy  queue: [RFQ]
handle(Sync)      → add('Z')                     queue: [RFQ, RFQ]

# PostgreSQL → pgdog (one RFQ, not two):
forward('C')      → pops RFQ, but C≠RFQ → push back  queue: [RFQ, RFQ]
forward('Z')      → pops one RFQ                 queue: [RFQ]  ← STALE FOREVER

Consequences

  • done() never returns true
  • Connection is never returned to the pool (query.rs:282-289)
  • rollback() fails with RollbackFailed
  • Effectively a connection leak per COPY operation

Verified end-to-end

Integration test using tokio-postgres::copy_in() through pgdog (integration/rust/tests/tokio_postgres/copy.rs):

  • WITHOUT fix: FATAL: query timeout at sink.finish() — pgdog's state machine is desynced and can't complete the COPY
  • WITH fix: COPY completes, subsequent SELECT count(*) returns correct results (passes in 0.09s)

Fix

When forward() receives CopyInResponse ('G'), call remove_one_rfq() to drop the ReadyForQuery that will never arrive. This makes the proxy resilient to clients that send Sync with the initial Parse+Bind+Execute for COPY statements.

Note: pgdog already handles COPY via extended protocol in prepared_statements.rsCopyDone, CopyFail, CopyData in handle() (lines 180-188) and CopyInResponse ('G') in forward() (line 229). The fix adds one call to the existing 'G' handler.

Tests

  • Unit test test_copy_in_with_client_double_sync — exercises the full sequence through PreparedStatements::forward() (the real code path), asserts clean state after the COPY cycle completes
  • Integration test test_copy_in_extended_protocol — end-to-end test using tokio-postgres::copy_in() through pgdog, verifying both COPY completion and subsequent query success

https://claude.ai/code/session_01PQvrbw2xJHgQBXtASWHFcv

@CLAassistant
Copy link
Copy Markdown

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

…mode

When a client sends Bind+Execute+Sync for a COPY FROM STDIN statement,
pgdog adds a ReadyForQuery expectation for that Sync.  But PostgreSQL
ignores Sync during COPY IN mode (protocol spec §55.2.6) and never sends
ReadyForQuery for it.  The stale entry stays in the queue, done() never
returns true, and the connection is never returned to the pool.

Call remove_one_rfq() in forward() when we see CopyInResponse ('G') to
drop the ReadyForQuery that will never arrive.

Verified with end-to-end integration test using tokio-postgres copy_in():
- WITHOUT fix: query timeout - CopyDone hangs because state machine is desynced
- WITH fix: COPY completes, subsequent queries work normally

https://claude.ai/code/session_01PQvrbw2xJHgQBXtASWHFcv
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants