Skip to content

fix: resolve Dandelion++ ABBA deadlock in CheckDandelionEmbargoes (Bug #29)#394

Merged
JaredTate merged 1 commit intoDigiByte-Core:feature/digidollar-v1from
JohnnyLawDGB:fix/dandelion-abba-deadlock
Apr 2, 2026
Merged

fix: resolve Dandelion++ ABBA deadlock in CheckDandelionEmbargoes (Bug #29)#394
JaredTate merged 1 commit intoDigiByte-Core:feature/digidollar-v1from
JohnnyLawDGB:fix/dandelion-abba-deadlock

Conversation

@JohnnyLawDGB
Copy link
Copy Markdown

Summary

  • Bug test: correct validation_block_tests #29: sendtoaddress hangs forever at "Processing Dandelion relay", shutdown hangs on threadDandelionShuffle.join(), RPC times out. Reported by DanGB (Windows 11, RC26, ~1 week uptime). Probabilistic — requires the shuffle timer and embargo check timer to race.
  • Root cause: ABBA deadlock between m_nodes_mutex and m_dandelion_embargo_mutex. The established lock ordering is m_nodes_mutexm_dandelion_embargo_mutex (used by DandelionShuffle, CloseDandelionConnections, DisconnectNodes). CheckDandelionEmbargoes() violated this by holding m_dandelion_embargo_mutex first, then calling usingDandelion() and localDandelionDestinationPushInventory() which acquire m_nodes_mutex internally.
  • Fix: Restructure CheckDandelionEmbargoes() into two phases — scan the embargo map under the embargo lock and collect txids needing stem routing, then release the lock and perform routing (which safely acquires m_nodes_mutex). Single file change, zero changes to dandelion.cpp/net.cpp/net.h.

Deadlock diagram

Thread A (DandelionShuffle):     LOCK(m_nodes_mutex)            → LOCK(m_dandelion_embargo_mutex)
Thread B (CheckDandelionEmbargoes): LOCK(m_dandelion_embargo_mutex) → LOCK(m_nodes_mutex) via usingDandelion()

Both threads block forever waiting on the lock the other holds.

Lock acquisition map (5 deadlock points identified)

Path File:Line Lock #1 Lock #2 Risk
DandelionShuffle() dandelion.cpp:344→357 m_nodes_mutex m_dandelion_embargo_mutex Correct order
CloseDandelionConnections() dandelion.cpp:211→292 m_nodes_mutex m_dandelion_embargo_mutex Correct order
DisconnectNodes() net.cpp:1952→CloseDandelionConnections m_nodes_mutex m_dandelion_embargo_mutex Correct order
CheckDandelionEmbargoes()→usingDandelion() net_processing.cpp:1630→dandelion.cpp:403 m_dandelion_embargo_mutex m_nodes_mutex DEADLOCK (fixed)
CheckDandelionEmbargoes()→localDandelionDestinationPushInventory() net_processing.cpp:1630→dandelion.cpp:68 m_dandelion_embargo_mutex m_nodes_mutex DEADLOCK (fixed)

Note: Commit 0caa0e84a1 (Feb 10) made the deadlock more likely by ensuring CloseDandelionConnections() always acquires the embargo mutex, eliminating any chance of the lock acquisition being skipped.

Test plan

  • dandelion_tests unit tests: 2/2 pass
  • Build clean on RC26 (v9.26.0-rc26 tag + fix) — no warnings
  • Build clean on HEAD (11cca7d + fix) — no warnings
  • Rapid send: 10 transactions in <2 seconds, all instant, no hangs
  • Embargo → mempool transitions: 15 successful transitions confirmed in debug log
  • Shutdown with active stempool: 3 txs sent, immediate stop — clean exit, no hang
  • Tested on testnet19 with 8-10 peers, Dandelion fully active (stempool, embargo, shuffle all running)
  • p2p_dandelion.py functional test (blocked by missing digibyte_scrypt Python module — pre-existing environment issue)

🤖 Generated with Claude Code

…DigiByte-Core#29)

CheckDandelionEmbargoes() held m_dandelion_embargo_mutex while calling
usingDandelion() and localDandelionDestinationPushInventory(), both of
which acquire m_nodes_mutex internally. This violates the established
lock ordering (m_nodes_mutex → m_dandelion_embargo_mutex) used by
DandelionShuffle() and CloseDandelionConnections(), creating an ABBA
deadlock when the shuffle timer and embargo check fire concurrently.

Symptoms: sendtoaddress hangs at "Processing Dandelion relay", RPC
timeout, shutdown hangs on threadDandelionShuffle.join(). Reported by
DanGB on Windows 11 (RC26, ~1 week uptime). Probabilistic — requires
two timer threads to race.

Fix: restructure CheckDandelionEmbargoes() into two phases:
  Phase 1: scan embargo map under m_dandelion_embargo_mutex, collect
           txids needing stem routing into a local vector.
  Phase 2: release embargo lock, then perform stem routing (which
           acquires m_nodes_mutex safely), re-acquire embargo lock
           briefly to mark each tx as routed.

usingDandelion() moved before the embargo lock acquisition. The bool
may be one cycle stale — no correctness impact.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@JaredTate
Copy link
Copy Markdown

Looks good. Thank you for this!

@JaredTate JaredTate merged commit a7f23b9 into DigiByte-Core:feature/digidollar-v1 Apr 2, 2026
1 of 2 checks passed
JaredTate added a commit that referenced this pull request Apr 2, 2026
…ull commit list

- Add Bug #29 (Dandelion++ ABBA deadlock fix, PR #394 by JohnnyLawDGB)
- Add Bug #33 (wrong mint tooltip limits + silent decimal truncation)
- Add Post-Quantum Cryptography plan documentation section
- Update commit list with all commits since RC26
- Update test suite status to reflect full validation
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants