Skip to content

minimization in mempool#70

Open
SIDDHANTCOOKIE wants to merge 4 commits intoStabilityNexus:mainfrom
SIDDHANTCOOKIE:refactor/mempool-o1
Open

minimization in mempool#70
SIDDHANTCOOKIE wants to merge 4 commits intoStabilityNexus:mainfrom
SIDDHANTCOOKIE:refactor/mempool-o1

Conversation

@SIDDHANTCOOKIE
Copy link
Contributor

@SIDDHANTCOOKIE SIDDHANTCOOKIE commented Mar 23, 2026

Addressed Issues:

This PR refactors the [Mempool] data structure from a linear [list] to a nested dictionary (dict[sender][nonce] = tx), optimizing transaction management from O(N) to O(1).

  • Replaced the expensive linear scanning loop inside [add_transaction] with direct dictionary lookups.
  • Removed the redundant _seen_tx_ids set. The nested dictionary inherently guarantees uniqueness, serving as a single source of truth.
  • The previous O(N) implementation was a potential DoS vulnerability, as adding a new transaction required a linear scan of every item in the pool. By switching to O(1) dictionary assignment, the node's inbound networking performance remains securely flat regardless of mempool size.

Screenshots/Recordings:

TODO: If applicable, add screenshots or recordings that demonstrate the interface before and after the changes.

Additional Notes:

AI Usage Disclosure:

We encourage contributors to use AI tools responsibly when creating Pull Requests. While AI can be a valuable aid, it is essential to ensure that your contributions meet the task requirements, build successfully, include relevant tests, and pass all linters. Submissions that do not meet these standards may be closed without warning to maintain the quality and integrity of the project. Please take the time to understand the changes you are proposing and their impact. AI slop is strongly discouraged and may lead to banning and blocking. Do not spam our repos with AI slop.

Check one of the checkboxes below:

  • This PR does not contain AI-generated code at all.
  • This PR contains AI-generated code. I have read the AI Usage Policy and this PR complies with this policy. I have tested the code locally and I am responsible for it.

I have used the following AI models and tools: TODO

Checklist

  • My PR addresses a single issue, fixes a single bug or makes a single improvement.
  • My code follows the project's code style and conventions
  • If applicable, I have made corresponding changes or additions to the documentation
  • If applicable, I have made corresponding changes or additions to tests
  • My changes generate no new warnings or errors
  • I have joined the Discord server and I will share a link to this PR with the project maintainers there
  • I have read the Contribution Guidelines
  • Once I submit my PR, CodeRabbit AI will automatically review it and I will address CodeRabbit's comments.
  • I have filled this PR template completely and carefully, and I understand that my PR may be closed without review otherwise.

Summary by CodeRabbit

  • Refactor

    • Mempool redesigned to track transactions per sender/nonce with an explicit size counter.
    • Identical transactions are rejected; differing transactions replace only if strictly newer.
    • Capacity checks apply only when adding new sender/nonce entries; replacements bypass full-capacity rejection.
    • Block selection now advances per-sender nonces and picks the earliest available timestamps across senders instead of slicing a single global sort.
  • Tests

    • Test inputs adjusted to reverse timestamp ordering used for block-selection verification.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 23, 2026

Walkthrough

Replace flat pending-tx list and global seen-ID set with a nested _pool: sender -> nonce -> tx plus _size; change add/remove/deduplication and block-selection to operate on per-sender nonces and timestamp-based replacement; adjust constructor default and remove _get_tx_id.

Changes

Cohort / File(s) Summary
Mempool core
minichain/mempool.py
Replaced _pending_txs + _seen_tx_ids with _pool mapping and _size; add_transaction now indexes by (sender,nonce) and only replaces when incoming timestamp is newer; capacity (max_size) enforced only for new (sender,nonce) inserts; block selection snapshots _pool, sorts per-sender by nonce, and picks earliest head tx across senders up to transactions_per_block; remove_transactions deletes by (sender,nonce) and prunes empty sender buckets; removed TRANSACTIONS_PER_BLOCK and _get_tx_id; __len__ returns _size; __init__ default transactions_per_block set to 100.
Tests
tests/test_protocol_hardening.py
Adjusted test input: timestamps in TestMempoolQueue.test_transactions_for_block_are_sorted_and_capped changed from 5000 - nonce to 5000 + nonce, reversing generated timestamp ordering used in assertions.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested labels

Python Lang

Suggested reviewers

  • Zahnentferner

Poem

🐇 I hop through pools by sender and nonce,

I count each hop and swap what’s once,
New timestamps win, old footprints pruned,
Heads chosen first, the rest attuned,
Hooray — a tidy mempool dance!

🚥 Pre-merge checks | ✅ 1 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name Status Explanation Resolution
Title check ❓ Inconclusive The title 'minimization in mempool' is vague and does not clearly convey the main technical change. While it relates to optimization, it lacks specificity about the nature of the refactoring (e.g., data structure change, algorithm optimization) that would help a teammate quickly understand the PR's purpose. Consider a more descriptive title such as 'Refactor mempool storage to O(1) dictionary-based lookup' or 'Optimize mempool with nested sender-nonce dictionary structure' to clearly communicate the primary architectural change.
✅ Passed checks (1 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@minichain/mempool.py`:
- Around line 35-38: The current block-selection code holds self._lock while
sorting, which serializes add_transaction/remove_transactions; instead, inside
the critical section (under self._lock) capture a shallow snapshot of
transactions from self._pool into a local list and then release the lock, and
perform txs.sort(...) and slice (txs[: self.transactions_per_block]) outside the
lock; update the method that contains this code (referencing self._lock,
self._pool, and transactions_per_block) so only the snapshot collection is done
under the lock and the O(N log N) sort runs after the lock is released.
- Around line 20-31: Currently the code calls self._pool.setdefault(tx.sender,
{}) before checking capacity, which creates empty per-sender buckets even when
the mempool is full; change the logic to first probe the bucket with
self._pool.get(tx.sender) (or check existing = self._pool.get(tx.sender,
{}).get(tx.nonce)), perform the duplicate check (existing and existing.tx_id ==
tx.tx_id) and the capacity check (if not existing and self._size >=
self.max_size) before mutating self._pool, and only call setdefault or assign a
new sub-dict and increment self._size when you are actually accepting and
storing the transaction; keep references to self._pool, self._size,
self.max_size, tx.sender, tx.nonce, and tx.tx_id to locate the changes.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: dd0d3d6c-6f70-4f6d-b26e-400508a08cb4

📥 Commits

Reviewing files that changed from the base of the PR and between 05ac7ea and fd9def2.

📒 Files selected for processing (1)
  • minichain/mempool.py

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@minichain/mempool.py`:
- Around line 20-31: When reinserting a tx into the mempool (the block shown
that reads existing = self._pool.get(tx.sender, {}).get(tx.nonce)), guard
against overwriting a newer replacement: only write pool[tx.nonce] = tx and
increment self._size when the slot is currently empty or contains the same
tx_id; if an existing entry with a different tx_id is present, skip the
overwrite and do not change self._size (also keep the max_size check when
inserting new slots). Update the logic around existing / self._size /
pool[tx.nonce] to enforce these conditions so a rejected-block restore cannot
revert a newer replacement.
- Around line 36-39: The current code flattens all mempool txs then
sorts/slices, which lets later nonces with earlier timestamps crowd out a
sender's lower nonce; fix by building the candidate window from per-sender
nonce-frontiers: iterate self._pool per sender, sort each sender's pool by nonce
(ascending), take that sender's lowest-nonce pending tx (and optionally
contiguous next nonces if you want deeper per-sender window), collect those
frontier txs into txs, then sort that candidate list by (timestamp, sender,
nonce) and return the first self.transactions_per_block entries; reference
self._pool, transactions_per_block, and the txs variable when making this
change.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 7e44c830-db11-47bd-b09d-1bd4f36973b9

📥 Commits

Reviewing files that changed from the base of the PR and between fd9def2 and 5a27e82.

📒 Files selected for processing (1)
  • minichain/mempool.py

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@minichain/mempool.py`:
- Around line 47-64: The selection loop in the mempool (function handling
selected/transactions_per_block using snapshot, best_tx, best_sender) only
compares tx.timestamp so ties depend on dict traversal order; change the
comparison to use a deterministic tuple key such as (tx.timestamp, sender,
tx.nonce) or (tx.timestamp, sender_id, tx.nonce) when deciding
best_tx/best_sender so heads with equal timestamps break ties consistently
across runs; implement the tuple construction and compare tuples instead of just
timestamp when selecting txs from snapshot in the loop.
- Around line 20-37: The bug is that _size is incremented even when an existing
(sender, nonce) entry is replaced; change the logic in the block using variables
existing, self._pool, and self._size so that self._size is only incremented when
inserting a new slot (i.e., when existing is falsy). Concretely, move the
self._size += 1 so it runs only in the branch that executes
self._pool.setdefault(tx.sender, {})[tx.nonce] = tx when existing is None (do
not increment on replacements), ensuring __len__() reflects actual pool
cardinality.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 5226d50b-7d42-4b88-8570-4ff2c64b3563

📥 Commits

Reviewing files that changed from the base of the PR and between 5a27e82 and 71d44e6.

📒 Files selected for processing (2)
  • minichain/mempool.py
  • tests/test_protocol_hardening.py

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
minichain/mempool.py (1)

44-64: ⚠️ Potential issue | 🟠 Major

Stop exposing higher nonces after a gap.

After Line 62 advances best_sender unconditionally, a sender with pending nonces 5 and 7 can still occupy two capped candidate slots even though 7 is not executable yet. That lets nonce gaps crowd out other senders' runnable heads.

🛠️ Proposed fix
-        for txs in snapshot.values():
-            txs.sort(key=lambda t: t.nonce)
+        for sender, txs in snapshot.items():
+            txs.sort(key=lambda t: t.nonce)
+            contiguous = []
+            next_nonce = None
+            for tx in txs:
+                if next_nonce is None or tx.nonce == next_nonce:
+                    contiguous.append(tx)
+                    next_nonce = tx.nonce + 1
+                else:
+                    break
+            snapshot[sender] = contiguous
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@minichain/mempool.py` around lines 44 - 64, The selection loop currently
allows non-executable higher-nonce transactions to occupy slots; fix by only
considering a sender's head tx (snapshot[sender][0]) if its nonce equals the
sender's next executable nonce (e.g., compare txs[0].nonce to the sender's
current/next nonce from whatever store you have, such as
self.get_next_nonce(sender) or self.account_nonces[sender]); skip that sender
entirely for this selection round if there is a gap so higher nonces cannot be
chosen as best_tx and crowd out other senders when building selected up to
self.transactions_per_block.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@minichain/mempool.py`:
- Around line 44-64: The selection loop currently allows non-executable
higher-nonce transactions to occupy slots; fix by only considering a sender's
head tx (snapshot[sender][0]) if its nonce equals the sender's next executable
nonce (e.g., compare txs[0].nonce to the sender's current/next nonce from
whatever store you have, such as self.get_next_nonce(sender) or
self.account_nonces[sender]); skip that sender entirely for this selection round
if there is a gap so higher nonces cannot be chosen as best_tx and crowd out
other senders when building selected up to self.transactions_per_block.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 6466410a-5e80-4a0f-9a88-3774ba96d507

📥 Commits

Reviewing files that changed from the base of the PR and between 71d44e6 and 98c3526.

📒 Files selected for processing (1)
  • minichain/mempool.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant