Skip to content

Spurious JIT channel Fix#10

Open
amackillop wants to merge 3 commits intolsp-0.2.0from
austin_spurious-jit-channel-fix-2
Open

Spurious JIT channel Fix#10
amackillop wants to merge 3 commits intolsp-0.2.0from
austin_spurious-jit-channel-fix-2

Conversation

@amackillop
Copy link
Copy Markdown

This does two things primarily. First commit simplifies the flow by limiting the places where we gate on channel re-usability and replaces a lingering peer_connected check. The second adds a set to track pending channels as a way to prevent unnecessary JIT channels when one is already in flight.

Channel usability (is_usable) was checked at four separate points:
htlc_intercepted, peer_connected, process_pending_htlcs, and
calculate_htlc_actions_for_peer. Each had its own deferral logic,
and they had to coordinate (the timer skipped channel opens
assuming peer_connected already handled them). This coordination
broke: PR #9 made peer_connected call process_htlcs_for_peer
during reestablish, which saw an empty capacity map because
non-usable channels were filtered out, and emitted a spurious
OpenChannel on every reconnect with a pending HTLC.

Move the usability check to execute_htlc_actions, right before
forward_intercepted_htlc. If no usable channel exists, the
forward is skipped and the HTLC stays in store for the timer to
retry. htlc_intercepted, peer_connected, and process_pending_htlcs
now all call process_htlcs_for_peer unconditionally.

calculate_htlc_actions_for_peer includes all channels in the
capacity map regardless of is_usable, so it correctly sees that a
reestablishing channel has sufficient capacity and does not request
a spurious new channel.

Change the pre-forward guard from is_peer_connected to
has_usable_channel, which covers the disconnect+reconnect race
where the peer is connected but the channel has not finished
reestablishing.
After the previous commit moved usability checks to execute time,
the timer can call process_htlcs_for_peer repeatedly while a
channel is still opening. calculate_htlc_actions_for_peer sees no
is_channel_ready channels and requests a new one each time,
producing duplicate OpenChannel events.

Add a pending_channel_opens set (RwLock<HashSet<PublicKey>>).
execute_htlc_actions inserts the peer when it emits OpenChannel,
and channel_ready removes it. If the set already contains the
peer, the OpenChannel is suppressed.

calculate_htlc_actions_for_peer now filters by is_channel_ready
instead of including all channels. Channels still opening
(is_channel_ready=false) report outbound_capacity_msat but reject
forwards with "Channel is still opening", consuming the
InterceptId and losing the HTLC. These are zero-conf channels, so
on-chain confirmation is not the issue; the channel simply hasn't
finished its opening handshake yet. Reestablishing channels
(is_channel_ready=true, is_usable=false) can forward once
reestablish completes and are included, preserving the
spurious-open fix from the previous commit.
pub fn channel_ready(
&self, counterparty_node_id: &PublicKey,
) -> Result<(), APIError> {
self.pending_channel_opens.write().unwrap().remove(counterparty_node_id);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this never executes, the node could end up stuck indefinitely and unable to forward HTLCs.

We should probably add a timeout, for example removing it after a minute.

Also, we likely need to listen for a channel_failed (or similar) event. If the channel fails to open, we should remove it from pending_channel_opens as well.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think listening for the failure side should be sufficient or can this be stuck in between somehow?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown

@martinsaposnic martinsaposnic Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what happens with long lived nodes that never get the peer_disconnected but still can get a channel failure and get stuck?

If a channel open is in flight and the peer disconnects, the
open is dead (LDK can't complete the funding handshake without a
connected peer). Without this cleanup, the pending_channel_opens
set would block future OpenChannel events for that peer
permanently, since channel_ready never fires for a failed open.

This does not reintroduce the duplicate-open problem from the
previous commit. That bug was caused by the timer firing
repeatedly while the peer stays connected and the channel is
still opening. A disconnect/reconnect is a genuine restart of
the channel lifecycle, so re-emitting OpenChannel is correct.
@amackillop amackillop force-pushed the austin_spurious-jit-channel-fix-2 branch from 5bfbc90 to dc10470 Compare March 24, 2026 11:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants