Skip to content

DIPs testing#67

Draft
MoonBoi9001 wants to merge 21 commits intomainfrom
samuel/dips-dev-environment
Draft

DIPs testing#67
MoonBoi9001 wants to merge 21 commits intomainfrom
samuel/dips-dev-environment

Conversation

@MoonBoi9001
Copy link
Member

Dev environment for end-to-end DIPs testing against Horizon contracts. Sharing as a draft.

@MoonBoi9001 MoonBoi9001 force-pushed the samuel/dips-dev-environment branch 3 times, most recently from cb0c65b to 39e989f Compare March 9, 2026 23:13
@MoonBoi9001 MoonBoi9001 changed the title DIPs dev environment with local source mounts DIPs testing Mar 10, 2026
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@MoonBoi9001 MoonBoi9001 force-pushed the samuel/dips-dev-environment branch from 39e989f to 28089bd Compare March 10, 2026 01:50
MoonBoi9001 and others added 17 commits March 10, 2026 16:23
Parameterize run scripts (indexer-agent, indexer-service, tap-agent, graph-node)
with env var overrides so the same scripts work for both the primary indexer and
N extras. Add gen-extra-indexers.py which produces a compose override file with
per-indexer postgres, graph-node, agent, service, and tap-agent stacks. Protocol
subgraph reads go to the primary graph-node via PROTOCOL_GRAPH_NODE_HOST.

Harden startup reliability for concurrent multi-indexer launches:
- Replace nodemon with a retry loop (nodemon hangs forever on crash)
- Serialize yarn install and cargo build via flock across shared mounts
- Add wait_for_rpc readiness check with curl fallback for non-foundry containers
- Use unless-stopped restart policy and retry_cast wrapper in registration
- Tune healthchecks with start_period for graph-node and agent containers

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Remove start-indexing-extra from agent depends_on so compose no longer
walks the full init chain (graph-contracts -> start-indexing -> start-
indexing-extra) on every `up -d`. This was causing chain container
bounces and cascading DNS failures when adding indexers to a running
stack. Agents now poll for on-chain staking (90 attempts, 5s interval)
instead of hard-failing, allowing registration to run in parallel.

Additional reliability and resource improvements:
- Use --no-deps --no-recreate in add-indexers skill
- Add dns_opt (timeout:2, attempts:5) to all long-running services
- Add mem_limit to chain (512m) and all generated services
- Cap chain Node.js heap at 384MB via NODE_OPTIONS
- Reduce extra postgres max_connections from 1000 to 200

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Each extra indexer now gets a unique operator derived from a mnemonic of
the form "test*11 {bip39_word}" instead of sharing ACCOUNT0. This matches
production topology where each indexer has an independent operator.

The generator validates BIP39 checksums at import time, derives operator
addresses via eth_account, and threads them through compose services and
the on-chain registration block.

Two bugs fixed in the registration block:
- setOperator argument order was (operator, verifier) but the contract
  expects (verifier, operator). The SubgraphService verifier triggers a
  legacy code path in HorizonStaking that reads _legacyOperatorAuth,
  which is only written when the verifier is the first argument.
- Operator auth was inside the staking if/else, so re-runs after a
  partial failure would skip authorization. Now runs unconditionally.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…d, network-status contract health

Register SubgraphService on RewardsManager in the deploy scripts so
allocation operations (reallocate, DIPs agreement acceptance) don't revert
with "Not a rewards issuer". Add contract health check to network-status.py
to surface this misconfiguration immediately.

Configure IISA cronjob to refresh the API's score cache after writing, and
set the local reload interval to 120s.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Dev containers (indexer-service, indexer-agent, dipper) now start cargo/yarn
builds immediately on container start, waiting for dependencies (config volume,
gateway, iisa) in parallel. Previously each waited for the full compose
dependency chain before starting compilation.

Changes:
- lib.sh: added wait_for_url() and wait_for_config() polling helpers
- run-dips.sh (indexer-service, indexer-agent): background build, parallel dep wait
- run.sh (dipper): background cargo build, parallel dep wait
- dips.yaml: depends_on !override to relax compose deps, SCORING_INTERVAL 30s

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The RecurringCollector uses the Authorizable pattern -- signers must be
explicitly authorized before their EIP-712 signatures are accepted. Without
this, every DIPs on-chain acceptance reverts with RecurringCollectorInvalidSigner.

The tap-escrow-manager only authorizes on PaymentsEscrow (for TAP), not
RecurringCollector (for DIPs). Added self-authorization of ACCOUNT0 in
start-indexing after allocations are active.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- BUGS.md: added BUG-013 (RCA metadata ABI encoding mismatch, root cause
  was version enum value, not encoding format)
- add-indexers skill: replaced fixed sleeps with polling loops
- .environment: no functional changes (whitespace)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- dipper run.sh: added chain_listener config pointing at indexing-payments
  subgraph for on-chain acceptance event monitoring
- dips.yaml: added REDPANDA_GATEWAY_IDS=local to iisa-cronjob environment

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- graph-contracts: ignore Tenderly verification failures (|| true) so
  contract deployment exits 0 and doesn't block the cascade
- indexer-service: wait for network subgraph before launching binary,
  preventing crash-loop when subgraph-deploy hasn't finished yet
- indexer-service: increase restart policy from on-failure:3 to on-failure:15
- start-indexing: fix RecurringCollector auth proof construction (removed
  stray --rpc-url flag and broken concat-hex fallback)
- fresh-deploy skill: use --no-build by default (scripts are volume-mounted),
  removed obsolete cascade/nonce-race steps, renumbered

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Split registration into two phases: sequential ACCOUNT0 funding (shared
nonce, must be serial) and parallel per-indexer setup (approve, stake,
setOperator -- each uses its own key, no nonce conflicts). For 5 indexers
this cuts registration from ~2.5 minutes to ~30 seconds.

Also uses --confirmations=0 for funding transactions (fire-and-forget,
next tx will see the state).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Allocations are created during the reconciliation cycle. The default
120s means waiting up to 2 minutes after setting indexing rules before
allocations appear. 15s keeps the add-indexers flow responsive.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@MoonBoi9001 MoonBoi9001 force-pushed the samuel/dips-dev-environment branch from 7c68181 to 0345ff8 Compare March 18, 2026 23:15
MoonBoi9001 and others added 3 commits March 18, 2026 18:26
The subgraphDeployment field is a relationship entity, not a string.
Filtering with subgraphDeployment: "QmPdb..." returned zero results.
Changed to fetch all active allocations and filter client-side by
ipfsHash. Also updated reconciliation cycle note to reflect 15s
local dev polling interval.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The gateway's candidate-selection algorithm heavily favors the primary
indexer (highest stake), so extras never build Redpanda history naturally.
The skill now pauses the primary, sends 200 queries to build history for
extras, unpauses, and resumes all subgraphs that the agent may have
paused during the outage.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Use .get() with defaults instead of direct dict access for fatal_error,
health, chain_head, and latest_block. DIPs allocations on test subgraphs
have no network field, causing KeyError on the old code.

Also removed unused active_indexer_ids variable and fixed f-string lint.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant