[FEATURE]: Implement Canonical Serialization for Blocks and Transactions#67
[FEATURE]: Implement Canonical Serialization for Blocks and Transactions#67sanaica wants to merge 8 commits intoStabilityNexus:mainfrom
Conversation
WalkthroughAdds a public Block Changes
Sequence Diagram(s)sequenceDiagram
participant Node
participant Block
participant P2P
participant Peer
Node->>Block: create block (optionally set miner)
Node->>Block: call canonical_payload()
Block-->>Node: canonical JSON bytes
Node->>P2P: broadcast_block(block)
P2P->>P2P: decode canonical_payload -> data (dict)
P2P->>P2P: envelope = {"type":"block","data": data, "miner": block.miner}
P2P->>Peer: send canonical_json_dumps(envelope) over TCP
Peer->>Peer: validate message has `type` & `data` and process
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@minichain/block.py`:
- Around line 121-124: broadcast_block currently sends block.to_dict() instead
of using Block.canonical_payload, which breaks deterministic wire encoding;
change broadcast_block to transmit the canonical bytes from canonical_payload
and remove the miner field from the serialized canonical block, instead
injecting the dynamic miner info only in the outer envelope (e.g., send
{payload: canonical_payload, miner: miner_id} or equivalent) so the canonical
serialization (canonical_json_bytes / canonical_payload) remains unchanged;
update any receiver/deserializer to read the envelope, verify/parse the
canonical payload with canonical_json_bytes semantics, and ensure
canonical_payload, to_dict, and the miner injection logic (previously inline in
broadcast_block) are the only touched symbols.
In `@tests/test_serialization.py`:
- Around line 32-44: Add a direct unit test that asserts Block.canonical_payload
is deterministic: call block.canonical_payload (or Block.canonical_payload() if
it's a method) twice on the same Block instance (constructed as in
test_block_hash_consistency with Transaction(sender="A", receiver="B",
amount=10, nonce=5) and difficulty=2) and assert the returned payload values are
identical; also create a second Block with identical fields and assert its
canonical_payload equals the first block's canonical_payload to verify
cross-instance determinism. Ensure you reference Block.canonical_payload and use
the same construction pattern as in test_block_hash_consistency so the new
assertions integrate with existing tests.
- Around line 45-52: The test entrypoint currently catches AssertionError and
only prints a message, which hides failures; update the main guard so assertion
failures propagate (either remove the try/except entirely or re-raise the caught
exception after printing). Locate the block that calls
test_raw_data_determinism(), test_transaction_id_stability(), and
test_block_hash_consistency() under if __name__ == "__main__": and modify it to
let AssertionError bubble up (or call raise after the print) so test runners see
a non-zero exit on failures.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: f3d3c1f0-fc6b-457e-8bc4-e8ba4987f6f3
📒 Files selected for processing (2)
minichain/block.pytests/test_serialization.py
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@minichain/p2p.py`:
- Around line 337-342: broadcast_block() now places miner at the top-level of
payload causing receivers that expect miner inside payload["data"] to drop it
and fall back to BURN_ADDRESS; fix by keeping backward compatibility: inject
miner into the existing block_data object (e.g., set block_data["miner"] =
miner) and then set payload = {"type":"block","data": block_data} so receivers
(main.py reading data["data"]["miner"]) keep working, or alternatively update
the receiver to read the top-level "miner" field (adjust the code that processes
incoming payloads in main.py to use payload["miner"] if present). Ensure the
change references the payload construction in broadcast_block, the block_data
variable, and the receiver logic that reads data["data"] so both sides remain
consistent.
- Around line 205-206: _validate_message currently assumes message is a dict and
calls set(message), which raises TypeError for non-dict JSON values; modify
_validate_message to first check isinstance(message, dict) and immediately
return False (and log the invalid message) if not a dict, then proceed to
validate the presence of "type" and "data" keys; update any log in the
listen/handler path that uses json.loads(...) to ensure invalid messages are
logged and ignored rather than crashing the task.
In `@tests/test_serialization.py`:
- Around line 19-29: The tests rely on auto-generated timestamps/signatures so
they only check same-instance stability; update test_transaction_id_stability
and test_block_serialization_determinism to pass explicit deterministic values
(e.g., timestamp and signature where applicable) into the Transaction and Block
constructors (reference: Transaction.__init__/tx_id and
Block.__init__/block_hash) and create two separate instances with identical
inputs, then assert tx1.tx_id == tx2.tx_id (and similarly block1.block_hash ==
block2.block_hash) to verify cross-instance determinism.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: bcdb9803-3e58-4df9-ad9c-3f3c939f7187
📒 Files selected for processing (2)
minichain/p2p.pytests/test_serialization.py
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@minichain/p2p.py`:
- Around line 339-349: Do not inject the external "miner" into the deserialized
block dict in p2p.py (the block_data = json.loads(block.canonical_payload...);
block_data["miner"] = miner) because miner must be part of the canonical payload
or derived from the block's coinbase; remove that injection and instead either
(A) ensure Block.to_dict()/Block.canonical_payload includes the miner when the
block is created so miner is inside block.canonical_payload, or (B) validate
that the claimed miner matches a coinbase/first transaction inside block_data
before using it for state changes (and reject or ignore mismatches). Update code
paths that rely on payload.get("miner") (e.g., the reward crediting in main.py)
to only use a miner value that is present inside block.canonical_payload or
derived from the block's transactions.
- Around line 46-50: The start() method currently binds the listener to
"0.0.0.0" which exposes the TCP endpoint on all interfaces; change start(self,
port: int = 9000) to accept a host parameter (e.g. start(self, port: int = 9000,
host: str = "127.0.0.1")) and use that host when calling
asyncio.start_server(self._handle_incoming, host, port), update the logger
message to include the host, and ensure _port/_server behavior remains the same
so operators can explicitly opt into binding to "0.0.0.0" by passing
host="0.0.0.0".
- Around line 304-307: In _broadcast_raw remove the unnecessary json.dumps
serialization: the first assignment to line using json.dumps(payload) is
immediately overwritten by the canonical_json_dumps call, doubling work on the
hot path; delete the json.dumps-based line assignment and keep only the
canonical_json_dumps(payload) + "\n").encode() form so the function serializes
payload once (refer to function _broadcast_raw and the variable line).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 98b8d654-98b9-4374-af5b-73a576e8f442
📒 Files selected for processing (2)
minichain/p2p.pytests/test_serialization.py
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
minichain/block.py (1)
68-77:⚠️ Potential issue | 🔴 CriticalIncluding
minerinto_header_dict()breaks backward compatibility with previously persisted blocks.Adding
minerto the hash input changes the hash computation for all blocks. This causes a critical backward compatibility issue:When blocks are loaded from storage (via
Block.from_dict()), theminerfield will beNoneif it's missing from the persisted data. Sinceto_header_dict()now includes"miner": self.miner, the hash recomputation atchain.py:21will produce a different value than what was originally calculated when that block was created without theminerfield in the hash input. This breaks chain validation for any previously saved blocks.If this codebase needs to preserve old blocks, either exclude
minerfrom the hash input or implement a migration strategy for old blocks.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@minichain/block.py` around lines 68 - 77, The change adding "miner" to Block.to_header_dict() alters the header used for hash computation and breaks validation of previously persisted blocks; revert the header shape used for hashing by removing "miner" from to_header_dict() (keep miner only in the full serialization like Block.to_dict() or add a separate to_full_dict()) so Block.from_dict()-loaded blocks still produce the same hash as originally computed in chain.py (where headers are re-hashed), or implement an explicit migration path that re-computes and updates stored block hashes—update the to_header_dict() function to exclude the miner field and ensure any serialization/deserialization code uses the separate full dict that includes miner.
♻️ Duplicate comments (1)
minichain/p2p.py (1)
304-307: 🧹 Nitpick | 🔵 TrivialRemove the redundant
json.dumps()call.Line 306 serializes
payloadwithjson.dumps(), but line 307 immediately overwriteslinewithcanonical_json_dumps(). This doubles CPU work on every broadcast for no effect.♻️ Proposed fix
async def _broadcast_raw(self, payload: dict): """Send a JSON message to every connected peer.""" - line = (json.dumps(payload) + "\n").encode() line = (canonical_json_dumps(payload) + "\n").encode()🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@minichain/p2p.py` around lines 304 - 307, In _broadcast_raw remove the redundant serialization: delete the initial json.dumps(...) assignment to line so you only call canonical_json_dumps(payload) + "\n" and then .encode(); update the code around async def _broadcast_raw and the local variable line to stop double-serializing the payload and only perform the canonical_json_dumps step.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@minichain/p2p.py`:
- Around line 336-344: The broadcast_block signature was changed to remove the
miner parameter but main.py still calls await network.broadcast_block(mined,
miner=pk); fix by setting the miner on the block before broadcasting (e.g.,
assign mined.miner = pk just prior to the broadcast call in main.py) or
alternatively restore/accept the miner argument in broadcast_block and use it to
populate the payload (referencing broadcast_block and the payload["miner"]
assignment) so the call site and implementation agree.
---
Outside diff comments:
In `@minichain/block.py`:
- Around line 68-77: The change adding "miner" to Block.to_header_dict() alters
the header used for hash computation and breaks validation of previously
persisted blocks; revert the header shape used for hashing by removing "miner"
from to_header_dict() (keep miner only in the full serialization like
Block.to_dict() or add a separate to_full_dict()) so Block.from_dict()-loaded
blocks still produce the same hash as originally computed in chain.py (where
headers are re-hashed), or implement an explicit migration path that re-computes
and updates stored block hashes—update the to_header_dict() function to exclude
the miner field and ensure any serialization/deserialization code uses the
separate full dict that includes miner.
---
Duplicate comments:
In `@minichain/p2p.py`:
- Around line 304-307: In _broadcast_raw remove the redundant serialization:
delete the initial json.dumps(...) assignment to line so you only call
canonical_json_dumps(payload) + "\n" and then .encode(); update the code around
async def _broadcast_raw and the local variable line to stop double-serializing
the payload and only perform the canonical_json_dumps step.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: e90a03ce-66f2-401f-8118-cfabba16f4f3
📒 Files selected for processing (2)
minichain/block.pyminichain/p2p.py
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@minichain/block.py`:
- Around line 91-96: The to_dict() method in Block currently always sets
"miner": self.miner which causes P2P validation to fail when miner is None;
update Block.to_dict() so it only adds the "miner" key when self.miner is not
None (and ideally only if it's a str) so peers' _validate_block_payload in
p2p.py will accept genesis/pre-mined blocks; locate the to_dict function (method
name: to_dict) and change the logic to conditionally include the miner field
when present.
- Around line 44-49: The constructor's miner parameter is missing a type
annotation; update the Block.__init__ signature to annotate miner (e.g., miner:
Optional[str] = None or the appropriate Miner type) and add the corresponding
attribute type on self.miner (ensure Optional is imported from typing if used)
so miner has an explicit type like self.miner: Optional[str] = miner.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: d4347d56-4e52-4289-9529-4e7a0ae5d8b2
📒 Files selected for processing (2)
minichain/block.pyminichain/p2p.py
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
|
@coderabbitai resume |
✅ Actions performedReviews resumed. |
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
minichain/p2p.py (1)
205-224:⚠️ Potential issue | 🟠 MajorValidate the duplicated
minerfield, or drop it.
broadcast_block()now emitsminerinsidedataand again at the top level, but_validate_message()only checks that"type"and"data"exist. A peer can therefore send conflictingminervalues and still pass schema validation, which leaves the block envelope ambiguous. Either keep a single source of truth, or reject block messages unless the top-levelmineris astrequal todata.get("miner")and omit it entirely whenblock.minerisNone.🔧 Minimal guard
def _validate_message(self, message): if not isinstance(message, dict): logger.warning("Network: Received non-dict message") return False if not {"type", "data"}.issubset(message): return False msg_type = message.get("type") payload = message.get("data") if msg_type not in SUPPORTED_MESSAGE_TYPES: return False + if msg_type == "block" and "miner" in message: + if not isinstance(message["miner"], str): + return False + if not isinstance(payload, dict) or message["miner"] != payload.get("miner"): + return False validators = { "sync": self._validate_sync_payload, "tx": self._validate_transaction_payload, "block": self._validate_block_payload, } return validators[msg_type](payload)- payload["miner"] = block.miner + if block.miner is not None: + payload["miner"] = block.minerAlso applies to: 335-348
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@minichain/p2p.py` around lines 205 - 224, The block messages currently allow a duplicated conflicting "miner" (top-level and inside data) so update validation and emission: in _validate_message() (and/or _validate_block_payload()) enforce that for msg_type "block" the top-level "miner" must either be absent when data.get("miner") is None, or must be a str equal to data.get("miner") when present; reject the message otherwise, and ensure SUPPORTED_MESSAGE_TYPES logic still routes to validators; also change broadcast_block() to avoid emitting a top-level "miner" when block.miner is None (or ensure it sets both places consistently) so there is a single source of truth.
♻️ Duplicate comments (1)
minichain/block.py (1)
91-100:⚠️ Potential issue | 🟠 Major
minercan still change without changing the block ID.
to_dict()/canonical_payloadnow putmineron the wire andfrom_dict()accepts it back, butcompute_hash()still hashes onlyto_header_dict(), which excludesminer. That means two different canonical block payloads can share the samehash;minichain/p2p.pythen treatspayload["hash"]as the block identity, so a relay can rewrite the miner without invalidating the block ID. Please either bindminerinto the hashed contents or derive it from a transaction instead of a mutable side field.Also applies to: 120-120, 128-131
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@minichain/block.py` around lines 91 - 100, The block identity can be altered by changing the mutable miner field because compute_hash() hashes only to_header_dict() (which excludes miner) while to_dict()/canonical_payload include miner; fix by binding miner into the hashed contents: update compute_hash() to include miner (e.g., have compute_hash() hash a header that includes self.miner or simply hash the same payload returned by to_dict()/to_header_dict() after adding miner) and make corresponding adjustments to to_header_dict() so it includes miner when present, ensuring canonical_payload, to_dict(), from_dict(), and compute_hash() all agree on the exact set of fields used for the block hash (alternatively, if miner must be derived, move miner into a transaction and remove the separate miner field so compute_hash()/to_header_dict() derive it from transactions).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@minichain/p2p.py`:
- Around line 205-224: The block messages currently allow a duplicated
conflicting "miner" (top-level and inside data) so update validation and
emission: in _validate_message() (and/or _validate_block_payload()) enforce that
for msg_type "block" the top-level "miner" must either be absent when
data.get("miner") is None, or must be a str equal to data.get("miner") when
present; reject the message otherwise, and ensure SUPPORTED_MESSAGE_TYPES logic
still routes to validators; also change broadcast_block() to avoid emitting a
top-level "miner" when block.miner is None (or ensure it sets both places
consistently) so there is a single source of truth.
---
Duplicate comments:
In `@minichain/block.py`:
- Around line 91-100: The block identity can be altered by changing the mutable
miner field because compute_hash() hashes only to_header_dict() (which excludes
miner) while to_dict()/canonical_payload include miner; fix by binding miner
into the hashed contents: update compute_hash() to include miner (e.g., have
compute_hash() hash a header that includes self.miner or simply hash the same
payload returned by to_dict()/to_header_dict() after adding miner) and make
corresponding adjustments to to_header_dict() so it includes miner when present,
ensuring canonical_payload, to_dict(), from_dict(), and compute_hash() all agree
on the exact set of fields used for the block hash (alternatively, if miner must
be derived, move miner into a transaction and remove the separate miner field so
compute_hash()/to_header_dict() derive it from transactions).
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 74ae8c05-d8e5-43ee-afa1-5514d9c06be0
📒 Files selected for processing (2)
minichain/block.pyminichain/p2p.py
|
Hi! All automated tests have passed and CodeRabbit review is fully approved ✅ |
Addressed Issues:
Closes #2
Screenshots/Recordings:

Additional Notes:
This PR implements a strict, deterministic serialization format to ensure data integrity across the network.
Added a canonical_payload property to the Block class to support full block transmission for networking.
Ensured that all signing and hashing operations in Transaction and Block use sorted keys and eliminated whitespace.
This work is a prerequisite for my GSoC 2026 proposal, which relies on immutable data structures for automated portfolio management.
AI Usage Disclosure:
[ ] This PR does not contain AI-generated code at all.
[x] This PR contains AI-generated code. I have read the AI Usage Policy and this PR complies with this policy. I have tested the code locally and I am responsible for it.
I have used Gemini Thinking model.
Checklist
[x] My PR addresses a single issue, fixes a single bug or makes a single improvement.
[x] My code follows the project's code style and conventions.
[x] If applicable, I have made corresponding changes or additions to the documentation.
[x] If applicable, I have made corresponding changes or additions to tests.
[x] My changes generate no new warnings or errors.
[x] I have joined the Discord server and I will share a link to this PR with the project maintainers there.
[x] I have read the Contribution Guidelines.
[x] Once I submit my PR, CodeRabbit AI will automatically review it and I will address CodeRabbit's comments.
[x] I have filled this PR template completely and carefully, and I understand that my PR may be closed without review otherwise.
Summary by CodeRabbit
New Features
Tests
Bug Fixes / Network