Skip to content

Claude/fix session reconnection zds mz#910

Open
marcodejongh wants to merge 12 commits intomainfrom
claude/fix-session-reconnection-ZDSMz
Open

Claude/fix session reconnection zds mz#910
marcodejongh wants to merge 12 commits intomainfrom
claude/fix-session-reconnection-ZDSMz

Conversation

@marcodejongh
Copy link
Owner

No description provided.

claude and others added 11 commits March 4, 2026 10:48
…overlay

When iOS Safari backgrounds the app, the OS silently kills the WebSocket
connection. Previously the UI stayed fully interactive while changes silently
failed to sync. This adds real-time WebSocket connection state tracking via
graphql-ws on.connected/on.closed callbacks, a visibilitychange handler to
proactively detect dead sockets on iOS foreground return, and a
"Reconnecting..." overlay on the queue control bar that blocks interaction
until the connection recovers.

https://claude.ai/code/session_016bdrJ2mAmjzSmZ2z6vWGN9
The on.connected callback was setting isWebSocketConnected=true immediately,
before handleReconnect could finish its async joinSession/delta-sync path.
This caused isReconnecting to become false prematurely, re-enabling queue
mutations while stale state was still being replayed.

Fix: onConnectionStateChange now receives an isReconnect flag. On
reconnections, the persistent session defers setting isWebSocketConnected=true
until handleReconnect's finally block, keeping the reconnecting overlay
and viewOnlyMode active throughout the entire resync.

Also adds iOS background connection loss documentation to
docs/websocket-implementation.md (section 7 under Failure States).

https://claude.ai/code/session_016bdrJ2mAmjzSmZ2z6vWGN9
- graphql-client.test.ts: Tests onConnectionStateChange callback fires
  with correct (connected, isReconnect) flags across initial connect,
  disconnect, and reconnection sequences. Verifies callback ordering
  (stateChange fires before onReconnect).

- connection-state.test.ts: Tests isReconnecting derivation truth table,
  reconnect lock timing (isReconnecting stays true during resync), and
  visibilitychange debounce behavior (300ms delay, rapid-fire cancellation,
  hidden state ignored).

https://claude.ai/code/session_016bdrJ2mAmjzSmZ2z6vWGN9
The full-page overlay blocked all user interaction. Replace it with a
bottom-center Snackbar/Alert that shows "Reconnecting..." with a spinner
and automatically disappears when the connection is re-established.
Queue mutations are still prevented by viewOnlyMode during reconnection.

https://claude.ai/code/session_016bdrJ2mAmjzSmZ2z6vWGN9
Break the BoardSessionBridge reactivation loop that caused infinite
connect/disconnect cycles after join failures. Track failed session IDs
so the bridge doesn't re-activate a session that was just cleared.

Also harden getSessionMembers() against partial Redis data (missing
username would violate String! schema constraint) and add production-safe
logging to the joinSession resolver.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The "detect failed session" effect in BoardSessionBridge was marking
sessions as failed on initial mount when activeSession is null (not yet
set), preventing the activation effect from ever running. Now only marks
a session as failed after it was previously active and then cleared.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The EVENTS_REPLAY query was using GraphQL field aliases (addedItem: item,
currentItem: item), so the response already had aliased field names. But
transformToSubscriptionEvent was reading event.item (the un-aliased name),
which was undefined at runtime. Remove the aliases so the response matches
the QueueEvent type, and the transform function correctly remaps the fields.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…nnection

Extract the 400-line connection useEffect from persistent-session-context.tsx
into two testable, non-React classes:

- ConnectionStateMachine: explicit FSM replacing 6+ scattered boolean state
  variables, eliminating inconsistent state combinations
- SessionConnection: encapsulates client lifecycle, session join/leave,
  subscription setup, reconnection with delta sync, and transient retry logic

This reduces persistent-session-context.tsx from 1418 to ~990 lines and makes
the connection orchestration independently testable.

https://claude.ai/code/session_01TimjGNkUn4VMvRCK4kP76s
Generated patch from the WebSocket ConnectionStateMachine + SessionConnection
refactor since git push is returning 403.

https://claude.ai/code/session_01TimjGNkUn4VMvRCK4kP76s
- Remove websocket-refactor.patch (77KB development artifact)
- Add resyncInProgress guard to prevent concurrent resyncs when
  triggerResync() is called rapidly (e.g., visibility + corruption)
- Reset state to IDLE before scheduleRetry() so the retry timer's
  connect() call isn't blocked by the duplicate-connection guard
- Gate CONNECTED transition on successful joinSession in handleReconnect()
  so transient reconnect failures don't falsely mark the session as connected

https://claude.ai/code/session_01FRKMbaqzmvrCb5CKHnGmBN
@claude
Copy link

claude bot commented Mar 4, 2026

Claude Review

Ready to merge - Minor issues noted below, but nothing blocking. This is a well-structured refactor that extracts connection logic into a testable SessionConnection class and properly handles iOS WebSocket reconnection with user feedback.

Minor Issues

  1. session-connection.ts:2447 - username is initialized to '' instead of undefined. If the backend treats empty string differently from null/undefined, this could cause issues. Consider initializing to undefined:

    private username: string | undefined; // instead of = ''
  2. Missing tests for SessionConnection class - The new 587-line session-connection.ts handles critical reconnection logic but lacks direct unit tests. Only the state machine and graphql-client have tests. Consider adding tests for reconnection scenarios.

  3. persistent-session-context.tsx:1942 - The eslint-disable for react-hooks/exhaustive-deps omits handleQueueEvent and handleSessionEvent from dependencies. While intentional to prevent reconnection, document this design decision in a comment near the disable.

Documentation

The docs/websocket-implementation.md update is thorough and accurately describes the iOS reconnection handling mechanism.

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: eacdf1a372

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +304 to +307
if (this.disposed || !sessionData) {
// joinSession failed — don't transition to CONNECTED
if (DEBUG) console.log('[SessionConnection] Reconnect joinSession failed, staying in current state');
return;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Transition out of RECONNECTING when rejoin fails

When joinSession() returns null during handleReconnect, this branch returns without any state transition or session cleanup, so the connection can stay stuck in RECONNECTING indefinitely after a failed rejoin (for example, if the session was ended while the client was offline). In this commit, isReconnecting is now used to force view-only mode, so this leaves users permanently locked out of queue mutations until a full page reload or manual URL/state reset.

Useful? React with 👍 / 👎.

Comment on lines +69 to +70
if (!activeSession && sessionIdFromUrl && hadActiveSessionRef.current) {
failedSessionIdsRef.current.add(sessionIdFromUrl);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Allow retrying same session ID after transient failures

This adds the current sessionIdFromUrl to a permanent in-memory failed set whenever activeSession is cleared, and activation is then unconditionally skipped for that same URL param. Because entries are only removed when the URL session param changes, a temporary backend outage that clears activeSession makes the same invite URL unrecoverable in-tab even after the backend is healthy again.

Useful? React with 👍 / 👎.

- Replace permanent failed session set with time-based cooldown (30s)
  so transient backend outages become retryable without page reload
- Transition to FAILED and clear session when handleReconnect's
  joinSession() returns null, preventing stuck RECONNECTING state
- Fix username initialized to '' instead of undefined to avoid
  backend treating empty string differently from missing value
- Improve eslint-disable comment documenting why handleQueueEvent
  and handleSessionEvent are intentionally omitted from deps

https://claude.ai/code/session_01FRKMbaqzmvrCb5CKHnGmBN
@claude
Copy link

claude bot commented Mar 4, 2026

Claude Review

⚠️ Needs attention - Missing unit tests for the core SessionConnection class (592 lines of complex connection orchestration logic).

Issues

  1. Missing tests for SessionConnection (packages/web/app/components/persistent-session/session-connection.ts) - The new tests cover the simpler ConnectionStateMachine and derivation logic, but the core 592-line class handling reconnection, delta sync, subscriptions, and retry logic has no unit tests. Given the complexity of WebSocket state management, this is a significant gap.

  2. Reduced type safety (persistent-session-context.tsx:85-122) - toClimbQueueItemInput parameter changed from LocalClimbQueueItem to unknown with internal cast. Consider keeping the typed signature and having the callback interface use the proper type.

  3. Unbounded Map growth (board-session-bridge.tsx:68) - failedSessionTimestampsRef Map is never pruned. While unlikely to be significant in practice, sessions older than the cooldown could be periodically cleaned up.

  4. eslint-disable exhaustive-deps (persistent-session-context.tsx:2077-2078) - While documented, intentionally omitting handleQueueEvent and handleSessionEvent from deps is fragile. If these callbacks ever become unstable (e.g., deps change), the connection won't re-establish properly. Consider using useRef for the handlers or extracting them to SessionConnection entirely.

Documentation

Documentation appropriately updated - new section added to docs/websocket-implementation.md for iOS Background Connection Loss.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants