Skip to content

Implement Edge Cookie (EC) identity system#582

Open
ChristianPavilonis wants to merge 11 commits intofeature/ssc-updatefrom
feature/edge-cookies-final
Open

Implement Edge Cookie (EC) identity system#582
ChristianPavilonis wants to merge 11 commits intofeature/ssc-updatefrom
feature/edge-cookies-final

Conversation

@ChristianPavilonis
Copy link
Collaborator

@ChristianPavilonis ChristianPavilonis commented Mar 26, 2026

Summary

  • Implement the complete Edge Cookie (EC) identity system (Epic Implement Edge Cookie (EC) identity system #532, Stories 1–11): first-party cookie generation, KV-backed identity graph with CAS concurrency, partner registry, pixel sync, server-to-server batch sync, pull sync dispatch, auction bidstream decoration, and consent-gated lifecycle management.
  • Migrate config from [edge_cookie] to [ec], restructure code into ec/ module tree, centralize cookie/response finalization in middleware, and refactor the Fastly adapter entrypoint for post-send background work.
  • Add Viceroy-driven end-to-end integration tests covering full EC lifecycle, consent withdrawal, concurrent partner syncs, batch sync auth, and identity lookup.

Changes

File Change
crates/trusted-server-adapter-fastly/src/main.rs Refactored to explicit Request::from_client() + send_to_client() for post-send pull sync; added EC route handlers (/sync, /identify, /api/v1/sync, /admin/partners/register); RouteOutcome struct for organic-route tracking
crates/trusted-server-core/src/ec/mod.rs New EcContext struct with two-phase lifecycle (read_from_request + generate_if_needed), consent integration, EC hash extraction
crates/trusted-server-core/src/ec/generation.rs EC ID generation ({64hex}.{6alnum}), IP normalization, format validation
crates/trusted-server-core/src/ec/cookies.rs EC cookie helpers with computed domain from publisher.domain
crates/trusted-server-core/src/ec/consent.rs Consent gating bridge for EC operations
crates/trusted-server-core/src/ec/kv.rs KV identity graph with CAS concurrency, create_or_revive, upsert_partner_id, tombstone writes, 300s last-seen debounce
crates/trusted-server-core/src/ec/kv_types.rs Identity graph schema: KvEntry, KvConsent, KvGeo, KvPartnerId, KvMetadata
crates/trusted-server-core/src/ec/partner.rs PartnerStore with API key hash verification, list_registered(), pull sync config validation
crates/trusted-server-core/src/ec/admin.rs POST /admin/partners/register with field validation, domain normalization, pull sync config checks
crates/trusted-server-core/src/ec/sync_pixel.rs GET /sync pixel endpoint: query validation, partner lookup, return-URL domain check, consent fallback, rate limiting, KV upsert
crates/trusted-server-core/src/ec/identify.rs GET /identify endpoint: response matrix (200/204/403), CORS validation, EID headers, per-partner UID headers
crates/trusted-server-core/src/ec/batch_sync.rs POST /api/v1/sync batch endpoint: Bearer auth, per-mapping validation, 200/207 semantics, rate limiting
crates/trusted-server-core/src/ec/pull_sync.rs Background pull sync dispatch: organic-route-only trigger, partner eligibility, HTTPS enforcement, URL allowlist, concurrency cap, rate limiting
crates/trusted-server-core/src/ec/eids.rs OpenRTB EID builder from KV partner data, base64-encoded header with truncation support
crates/trusted-server-core/src/ec/finalize.rs Centralized finalize_response(): cookie set/delete, KV tombstone writes, header propagation
crates/trusted-server-core/src/auction/endpoints.rs Auction EID resolution from KV identity graph
crates/trusted-server-core/src/auction/formats.rs Bidstream decoration with x-ts-eids, x-ts-ec-consent, per-partner headers
crates/trusted-server-core/src/settings.rs [ec] config section with passphrase, store names, pull_sync_concurrency; placeholder rejection
crates/trusted-server-core/src/edge_cookie.rs Removed (migrated to ec/ module tree)
crates/integration-tests/tests/common/ec.rs EC test helpers: cookie-aware client, minimal origin server, partner registration, sync/identify builders
crates/integration-tests/tests/frameworks/scenarios.rs 7 EC lifecycle scenarios: full flow, consent withdrawal, concurrent syncs, batch sync, auth rejection
crates/integration-tests/tests/integration.rs test_ec_lifecycle_fastly test runner
scripts/integration-tests.sh Fixed build env var to TRUSTED_SERVER__EC__PASSPHRASE
scripts/integration-tests-browser.sh Same env var fix
.github/actions/setup-integration-test-env/action.yml Same env var fix

Closes

Closes #533, Closes #535, Closes #536, Closes #537, Closes #538, Closes #539, Closes #540, Closes #541, Closes #542, Closes #543, Closes #544

Test plan

  • cargo test --workspace
  • cargo clippy --workspace --all-targets --all-features -- -D warnings
  • cargo fmt --all -- --check
  • JS tests: cd crates/js/lib && npx vitest run
  • WASM build: cargo build --package trusted-server-adapter-fastly --release --target wasm32-wasip1
  • Manual testing via fastly compute serve — full lifecycle verified (EC generation → partner registration → pixel sync → identify → batch sync → consent withdrawal)
  • Integration tests: cargo test --manifest-path crates/integration-tests/Cargo.toml -- --include-ignored test_ec_lifecycle_fastly (7 scenarios pass)

EC Manual Testing Plan

Manual testing guide for the Edge Cookie (EC) identity system.
Run these tests against a local Viceroy instance with the full EC branch stack applied.

Prerequisites

1. Build the WASM binary

TRUSTED_SERVER__PUBLISHER__ORIGIN_URL="http://127.0.0.1:8888" \
TRUSTED_SERVER__PUBLISHER__PROXY_SECRET="integration-test-proxy-secret" \
TRUSTED_SERVER__EC__PASSPHRASE="integration-test-ec-secret" \
TRUSTED_SERVER__PROXY__CERTIFICATE_CHECK=false \
cargo build --package trusted-server-adapter-fastly --release --target wasm32-wasip1

2. Start a minimal origin server

3. Start Viceroy

fastly compute serve

The server is now at http://127.0.0.1:7676.

4. Test credentials

Auth type Endpoint Credentials
Basic auth /admin/* admin / changeme
Bearer token /api/v1/sync The raw api_key value you used when registering a partner

Links

# EC Manual Testing Plan

Manual testing guide for the Edge Cookie (EC) identity system.
Run these tests against a local Viceroy instance with the full EC branch stack applied.

Prerequisites

1. Build the WASM binary

TRUSTED_SERVER__PUBLISHER__ORIGIN_URL="http://127.0.0.1:8888" \
TRUSTED_SERVER__PUBLISHER__PROXY_SECRET="integration-test-proxy-secret" \
TRUSTED_SERVER__EC__PASSPHRASE="integration-test-ec-secret" \
TRUSTED_SERVER__PROXY__CERTIFICATE_CHECK=false \
cargo build --package trusted-server-adapter-fastly --release --target wasm32-wasip1

2. Start a minimal origin server

3. Start Viceroy

fastly compute serve

The server is now at http://127.0.0.1:7676.

4. Test credentials

Auth type Endpoint Credentials
Basic auth /admin/* admin / changeme
Bearer token /api/v1/sync The raw api_key value you used when registering a partner

Test 1: EC Cookie Generation (organic page load)

What you're testing: First visit to an organic route generates an EC cookie.

Steps:

  1. Open browser DevTools → Network tab
  2. Navigate to http://127.0.0.1:7676/ (or any non-EC path)
  3. Inspect the response headers

What to look for:

  • Response header Set-Cookie contains ts-ec=<64hex>.<6alnum>; Domain=.test-publisher.com; Path=/; Secure; SameSite=Lax; Max-Age=31536000
  • Response header x-ts-ec echoes the same EC ID value
  • In DevTools Application → Cookies, ts-ec cookie is present (note: may not appear on 127.0.0.1 due to domain mismatch — check response headers instead)
  • Status code is 200 (origin content proxied successfully)

With curl:

curl -v http://127.0.0.1:7676/ 2>&1 | grep -i 'set-cookie\|x-ts-ec'

Expected:

< set-cookie: ts-ec=abcdef01...64hex....Xyz123; Domain=.test-publisher.com; Path=/; Secure; SameSite=Lax; Max-Age=31536000
< x-ts-ec: abcdef01...64hex....Xyz123

Second request with the same EC (returning user):

EC_ID="<value from above>"
curl -v -H "Cookie: ts-ec=$EC_ID" http://127.0.0.1:7676/
  • Same x-ts-ec header echoed back
  • No new Set-Cookie if the cookie is unchanged (or re-sets same value)

Test 2: Partner Registration

What you're testing: Register a partner for use in sync tests.

curl -v -X POST http://127.0.0.1:7676/admin/partners/register \
  -u admin:changeme \
  -H "Content-Type: application/json" \
  -d '{
    "id": "ssp_test",
    "name": "Test SSP",
    "api_key": "my-secret-api-key",
    "allowed_return_domains": ["sync.example.com"],
    "source_domain": "ssp-test.example.com",
    "bidstream_enabled": true
  }'

What to look for:

  • Status 201 Created with JSON body containing "created": true
  • Response body has "id": "ssp_test", "bidstream_enabled": true
  • Response does not contain api_key or api_key_hash (sensitive fields redacted)

Error cases to verify:

# No auth → 401
curl -v -X POST http://127.0.0.1:7676/admin/partners/register \
  -H "Content-Type: application/json" \
  -d '{"id":"x","name":"x","api_key":"x","allowed_return_domains":["x.com"],"source_domain":"x.com"}'

# Wrong auth → 401
curl -v -X POST http://127.0.0.1:7676/admin/partners/register \
  -u admin:wrong \
  -H "Content-Type: application/json" \
  -d '{"id":"x","name":"x","api_key":"x","allowed_return_domains":["x.com"],"source_domain":"x.com"}'
  • Both return 401 with WWW-Authenticate: Basic header

Test 3: Pixel Sync (GET /sync)

What you're testing: Partner pixel fires and writes a UID into the identity graph.

Prerequisites: Complete Test 1 (have an EC ID) and Test 2 (have a registered partner).

EC_ID="<your-ec-id-from-test-1>"
curl -v -H "Cookie: ts-ec=$EC_ID" \
  "http://127.0.0.1:7676/sync?partner=ssp_test&uid=user-abc-123&return=https://sync.example.com/done"

What to look for:

  • Status 302 Found
  • Location header is https://sync.example.com/done?ts_synced=1
  • Server logs show KV write activity (if log level is debug)

Error cases:

# No EC cookie → redirect with ts_synced=0&ts_reason=no_ec
curl -v "http://127.0.0.1:7676/sync?partner=ssp_test&uid=user-abc-123&return=https://sync.example.com/done"

# Unknown partner → 400
curl -v -H "Cookie: ts-ec=$EC_ID" \
  "http://127.0.0.1:7676/sync?partner=unknown&uid=x&return=https://sync.example.com/done"

# Return URL not in allowlist → 400
curl -v -H "Cookie: ts-ec=$EC_ID" \
  "http://127.0.0.1:7676/sync?partner=ssp_test&uid=x&return=https://evil.com/steal"
  • No-EC case: 302 to return URL with ?ts_synced=0&ts_reason=no_ec
  • Unknown partner: 400
  • Bad return URL: 400

Test 4: Identity Lookup (GET /identify)

What you're testing: After syncing, the identity graph returns partner UIDs.

Prerequisites: Complete Test 3 (have a synced UID).

EC_ID="<your-ec-id>"
curl -v -H "Cookie: ts-ec=$EC_ID" http://127.0.0.1:7676/identify

What to look for:

  • Status 200
  • Body is JSON: {"ec":"<ec_id>","consent":"ok","degraded":false,"uids":{"ssp_test":"user-abc-123"},"eids":[...]}
  • Response headers include:
    • x-ts-ec: <ec_id>
    • x-ts-ec-consent: ok
    • x-ts-eids: <base64 string>
    • x-ts-ssp_test: user-abc-123

No EC cookie:

curl -v http://127.0.0.1:7676/identify
  • Status 204 No Content (empty body)

With CORS origin:

curl -v -H "Origin: https://test-publisher.com" \
  -H "Cookie: ts-ec=$EC_ID" \
  http://127.0.0.1:7676/identify
  • Access-Control-Allow-Origin: https://test-publisher.com
  • Access-Control-Allow-Credentials: true
  • Vary: Origin

CORS preflight:

curl -v -X OPTIONS \
  -H "Origin: https://test-publisher.com" \
  -H "Access-Control-Request-Method: GET" \
  http://127.0.0.1:7676/identify
  • Status 204
  • CORS headers present

Non-publisher origin → rejection:

curl -v -H "Origin: https://evil.com" \
  -H "Cookie: ts-ec=$EC_ID" \
  http://127.0.0.1:7676/identify
  • Status 403

Test 5: S2S Batch Sync (POST /api/v1/sync)

What you're testing: Server-to-server batch UID writes via authenticated API.

Prerequisites: Test 2 (have a partner with known api_key).

First, get an EC hash (the 64-hex prefix before the dot):

EC_HASH=$(echo "$EC_ID" | cut -d. -f1)
FUTURE=$(($(date +%s) + 10))

[!warning] Timestamp must be newer than existing sync
The KV identity graph keeps the most recent sync per partner (existing.synced >= incoming is treated as stale). Use a timestamp slightly in the future (+10) to guarantee the batch write overwrites a recent pixel sync.

curl -v -X POST http://127.0.0.1:7676/api/v1/sync \
  -H "Authorization: Bearer my-secret-api-key" \
  -H "Content-Type: application/json" \
  -d "{
    \"mappings\": [
      {\"ssc_hash\": \"$EC_HASH\", \"partner_uid\": \"batch-uid-99\", \"timestamp\": $FUTURE}
    ]
  }"

What to look for:

  • Status 200
  • Body: {"accepted":1,"rejected":0,"errors":[]}
  • Verify via identify: curl -H "Cookie: ts-ec=$EC_ID" http://127.0.0.1:7676/identify now shows "ssp_test":"batch-uid-99" (overwritten from pixel sync)

Error cases:

# No auth → 401
curl -v -X POST http://127.0.0.1:7676/api/v1/sync \
  -H "Content-Type: application/json" \
  -d '{"mappings":[{"ssc_hash":"a]","partner_uid":"x","timestamp":1}]}'

# Wrong auth → 401
curl -v -X POST http://127.0.0.1:7676/api/v1/sync \
  -H "Authorization: Bearer wrong-key" \
  -H "Content-Type: application/json" \
  -d '{"mappings":[{"ssc_hash":"a","partner_uid":"x","timestamp":1}]}'

# Invalid hash → 200 or 207 with per-mapping rejection
curl -v -X POST http://127.0.0.1:7676/api/v1/sync \
  -H "Authorization: Bearer my-secret-api-key" \
  -H "Content-Type: application/json" \
  -d '{"mappings":[{"ssc_hash":"not-64-hex","partner_uid":"x","timestamp":1}]}'
  • No auth: 401 with {"error":"invalid_token"}
  • Wrong auth: 401
  • Invalid hash: 200 with rejected: 1 and errors: [{"index":0,"reason":"invalid_ec_hash"}]

Test 6: Consent Withdrawal (GPC)

What you're testing: Sec-GPC: 1 header triggers EC cookie deletion.

EC_ID="<your-ec-id>"

# Request with GPC → cookie should be expired
curl -v -H "Cookie: ts-ec=$EC_ID" -H "Sec-GPC: 1" http://127.0.0.1:7676/

What to look for:

  • Response contains Set-Cookie: ts-ec=; Domain=.test-publisher.com; Path=/; Secure; SameSite=Lax; Max-Age=0
  • The Max-Age=0 means the cookie is being deleted

Then verify identify is denied:

curl -v -H "Cookie: ts-ec=$EC_ID" -H "Sec-GPC: 1" http://127.0.0.1:7676/identify
  • Status 403 with body {"consent":"denied"}

Test 7: Auction Bidstream Decoration

What you're testing: Auction responses include EC identity headers.

[!note] Auction testing requires a configured Prebid backend
If Prebid is not configured, the auction endpoint may return an error. The key thing to verify is that EC headers appear on the response regardless of auction outcome.

EC_ID="<your-ec-id>"
curl -v -X POST http://127.0.0.1:7676/auction \
  -H "Cookie: ts-ec=$EC_ID" \
  -H "Content-Type: application/json" \
  -d '{"slots":[]}'

What to look for in response headers:

  • x-ts-ec: <ec_id> — EC ID is propagated
  • x-ts-ec-consent: ok — consent status
  • x-ts-eids: <base64> — EID array (if any partner syncs exist)
  • x-ts-ssp_test: <uid> — per-partner UID headers (if synced)

Test 8: Pull Sync (server logs only)

What you're testing: Background pull sync fires after organic requests.

[!warning] Pull sync requires registered partners with pull_sync_enabled
You need a partner registered with pull sync fields set, and a reachable HTTPS endpoint. In local testing, pull sync will log warnings about missing config or unreachable backends.

Register a pull-enabled partner:

curl -X POST http://127.0.0.1:7676/admin/partners/register \
  -u admin:changeme \
  -H "Content-Type: application/json" \
  -d '{
    "id": "pull_ssp",
    "name": "Pull SSP",
    "api_key": "pull-api-key",
    "allowed_return_domains": ["sync.example.com"],
    "source_domain": "pull-ssp.example.com",
    "pull_sync_enabled": true,
    "pull_sync_url": "https://pull.example.com/sync",
    "pull_sync_allowed_domains": ["pull.example.com"],
    "pull_sync_ttl_sec": 3600,
    "pull_sync_rate_limit": 10,
    "ts_pull_token": "my-pull-token"
  }'

Then make an organic request and check server logs (stderr):

curl -H "Cookie: ts-ec=$EC_ID" http://127.0.0.1:7676/

What to look for in Viceroy stderr:

  • Pull sync: enumerated N partners (M pull-enabled) — confirms dispatch was attempted
  • Errors about unreachable backend are expected in local testing
  • Pull sync should not fire on /sync, /identify, /auction, or /admin paths

Verify pull sync does NOT fire on non-organic routes:

curl -H "Cookie: ts-ec=$EC_ID" http://127.0.0.1:7676/identify
# Check logs — should NOT see "Pull sync: enumerated" after this request

Quick Full-Loop Script

Run the full lifecycle in one shot:

BASE="http://127.0.0.1:7676"

echo "=== 1. Generate EC ==="
RESP=$(curl -s -D /dev/stderr "$BASE/" 2>&1)
EC_ID=$(echo "$RESP" | grep -i 'x-ts-ec:' | head -1 | awk '{print $2}' | tr -d '\r')
EC_HASH=$(echo "$EC_ID" | cut -d. -f1)
echo "EC_ID=$EC_ID"
echo "EC_HASH=$EC_HASH"

echo ""
echo "=== 2. Register partner ==="
curl -s -X POST "$BASE/admin/partners/register" \
  -u admin:changeme \
  -H "Content-Type: application/json" \
  -d '{"id":"demo","name":"Demo","api_key":"demo-key","allowed_return_domains":["sync.example.com"],"source_domain":"demo.example.com","bidstream_enabled":true}' | python3 -m json.tool

echo ""
echo "=== 3. Pixel sync ==="
curl -s -D /dev/stderr -H "Cookie: ts-ec=$EC_ID" \
  "$BASE/sync?partner=demo&uid=pixel-uid-1&return=https://sync.example.com/done" 2>&1 | grep -i location

echo ""
echo "=== 4. Identify ==="
curl -s -H "Cookie: ts-ec=$EC_ID" "$BASE/identify" | python3 -m json.tool

echo ""
echo "=== 5. Batch sync ==="
FUTURE=$(($(date +%s) + 10))
curl -s -X POST "$BASE/api/v1/sync" \
  -H "Authorization: Bearer demo-key" \
  -H "Content-Type: application/json" \
  -d "{\"mappings\":[{\"ssc_hash\":\"$EC_HASH\",\"partner_uid\":\"batch-uid-2\",\"timestamp\":$FUTURE}]}" | python3 -m json.tool

echo ""
echo "=== 6. Identify again (should show batch-uid-2) ==="
curl -s -H "Cookie: ts-ec=$EC_ID" "$BASE/identify" | python3 -m json.tool

echo ""
echo "=== 7. Consent withdrawal ==="
curl -s -D /dev/stderr -H "Cookie: ts-ec=$EC_ID" -H "Sec-GPC: 1" "$BASE/" 2>&1 | grep -i 'set-cookie.*ts-ec'

What to Watch in Server Logs

Set RUST_LOG=debug when starting Viceroy for maximum visibility:

RUST_LOG=debug viceroy target/wasm32-wasip1/release/trusted-server-adapter-fastly.wasm \
  -C fastly.toml --addr 127.0.0.1:7676
Log pattern Meaning
EC generated for client New EC ID created
Proxying request to publisher_origin Organic route proxying to origin
Pull sync: enumerated N partners Pull sync dispatch attempted
Pull sync: rate-limited partner Hourly rate limit hit
KV CAS conflict, retrying Concurrent write detected, auto-retry
KV entry not found for hash First-time EC, no identity graph yet
Sync pixel: upsert partner Pixel sync writing to KV
Batch sync: processing N mappings Batch sync in progress
finalize_response: setting EC cookie Cookie being written to response
finalize_response: expiring EC cookie Consent withdrawal cookie deletion

Checklist

  • Changes follow CLAUDE.md conventions
  • No unwrap() in production code — use expect("should ...")
  • Uses tracing macros (not println!)
  • New code has tests
  • No secrets or credentials committed

…igration

- Add ec/ module with EcContext lifecycle, generation, cookies, and consent
- Compute cookie domain from publisher.domain, move EC cookie helpers
- Fix auction consent gating, restore cookie_domain for non-EC cookies
- Add integration proxy revocation, refactor EC parsing, clean up ec_hash
- Remove fresh_id and ec_fresh per EC spec §12.1
- Migrate [edge_cookie] config to [ec] per spec §14
Implement Story 3 (#536): KV-backed identity graph with compare-and-swap
concurrency, partner ID upserts, tombstone writes for consent withdrawal,
and revive semantics. Includes schema types, metadata, 300s last-seen
debounce, and comprehensive unit tests.

Also incorporates earlier foundation work: EC module restructure, config
migration from [edge_cookie] to [ec], cookie domain computation, consent
gating fixes, and integration proxy revocation support.
Implement Story 4 (#537): partner KV store with API key hashing,
POST /admin/partners/register with basic-auth protection, strict
field validation (ID format, URL allowlists, domain normalization),
and pull-sync config validation. Includes index-based API key lookup
and comprehensive unit tests.
Implement Story 5 (#538): centralize EC cookie set/delete and KV
tombstone writes in finalize_response(), replacing inline mutation
scattered across publisher and proxy handlers. Adds consent-withdrawal
cleanup, EC header propagation on proxy requests, and docs formatting.
Implement Story 8 (#541): POST /api/v1/sync with Bearer API key auth,
per-partner rate limiting, batch size cap, per-mapping validation and
rejection reasons, 200/207 response semantics, tolerant Bearer parsing,
and KV-abort on store unavailability.
Implement Story 9 (#542): server-to-server pull sync that runs after
send_to_client() on organic traffic only. Refactors the Fastly adapter
entrypoint from #[fastly::main] to explicit Request::from_client() +
send_to_client() to enable post-send background work.

Pull sync enumerates pull-enabled partners, checks staleness against
pull_sync_ttl_sec, validates URL hosts against the partner allowlist,
enforces hourly rate limits, and dispatches concurrent outbound GETs
with Bearer auth. Responses with uid:null or 404 are no-ops; valid
UIDs are upserted into the identity graph.

Includes EC ID format validation to prevent dispatch on spoofed values,
partner list_registered() for KV store enumeration, and configurable
pull_sync_concurrency (default 3).
Implement Story 11 (#544): Viceroy-driven E2E tests covering full EC
lifecycle (generation, pixel sync, identify, batch sync, consent
withdrawal, auth rejection). Adds EC test helpers with manual cookie
tracking, minimal origin server with graceful shutdown, and required
KV store fixtures. Fixes integration build env vars.
Consolidate is_valid_ec_hash and current_timestamp into single canonical
definitions to eliminate copy-paste drift across the ec/ module tree. Fix
serialization error variants in admin and batch_sync to use Ec instead of
Configuration. Add scaling and design-decision documentation for partner
store enumeration, rate limiter burstiness, and plaintext pull token storage.
Use test constructors consistently in identify and finalize tests.
@ChristianPavilonis ChristianPavilonis marked this pull request as ready for review March 26, 2026 17:33
@ChristianPavilonis
Copy link
Collaborator Author

ChristianPavilonis commented Mar 26, 2026

Questions for the Team

Q1: Admin auth model — Bearer token vs Basic auth (§13.2, §14.1, §17)

The spec says /admin/partners/register should use Bearer token in-handler auth with a new admin_token_hash setting, and the [[handlers]] pattern should be narrowed from "^/admin" to "^/admin/keys". The implementation uses the existing [[handlers]] Basic Auth system for all admin routes.

Did the spec author know about the existing Basic Auth [[handlers]] system? The current approach is arguably simpler and consistent with existing admin routes. If we keep Basic Auth:

  • No admin_token_hash setting needed
  • No handler pattern narrowing needed
  • No admin-route-scan test exclusion list needed
  • Trade-off: partners can't authenticate with a Bearer token — they need Basic Auth credentials

Recommendation: Follow up with spec author to align.

Q2: [[handlers]] pattern scope (§13.2)

Related to Q1 — the spec says narrow "^/admin" to "^/admin/keys". If we keep Basic Auth for all admin routes, this is moot. If we move to Bearer, this needs to change.

Q3: IPv6 normalization format (§4.1)

The spec says: "Split on :, take first 4 groups, join with :" → "2001:db8:85a3:0".

The implementation concatenates as zero-padded hex without separators → "20010db885a30000".

Both are internally consistent (same input always produces same output). The spec format is human-readable; the implementation format is more compact. Is there any external system that needs to match this format? If EC hashes are only compared internally, either format works. If partners or external tools need to independently compute EC hashes from IPs, we need to agree on one format.

Q4: client_ip as String vs IpAddr (§4.3)

Spec says Option<IpAddr>, implementation stores Option<String> (pre-normalized). The string is already normalized at capture time, which is actually more efficient (no re-normalization needed). Pull sync sends ip={ip_address} in the outbound URL — should that be the original IP or the normalized form? If partners need the original IP for their own resolution, we'd need to store both.

Q5: generate_if_needed() return type (§4.3)

Spec says this function "never returns an error" and callers "must NOT propagate with ?". The implementation returns Result<(), Report<TrustedServerError>>. Callers currently handle correctly with if let Err(err) = ... { log::warn!(...) }.

Should we move the log to the function or keep the Result for flexibility?

Q6: ec_store / partner_store optionality (§14.1)

Spec says these are required — omitting [ec] is a startup error, and EdgeCookie does not derive Default. The implementation has them as Option<String> with #[serde(default)], and Ec derives Default.

Was this intentional for incremental development? The spec wants a hard startup error if KV stores aren't configured. The optional approach is more forgiving but means a misconfigured deployment silently runs without KV. Related: generate_if_needed() takes kv: Option<&KvIdentityGraph> instead of &KvIdentityGraph.

Should we tighten this before merge, or is it acceptable for the current phase?

- Rename ssc_hash → ec_hash in batch sync wire format (§9.3)
- Strip x-ts-* prefix headers in copy_custom_headers (§15)
- Strip dynamic x-ts-<partner_id> headers in clear_ec_on_response (§5.2)
- Add PartnerNotFound and PartnerAuthFailed error variants (§16)
- Rename Ec error variant → EdgeCookie (§16)
- Validate EC IDs at read time, discard malformed values (§4.2)
- Add rotating hourly offset for pull sync partner dispatch (§10.3)
- Add _pull_enabled secondary index for O(1+N) pull sync reads (§13.1)
Copy link
Collaborator

@prk-Jr prk-Jr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary

This PR introduces the full EC identity subsystem — partner registry, pixel sync, batch sync, pull sync, consent gating, KV graph operations, and admin endpoints. The architecture is coherent and several design choices are genuinely good, but there are six blocking issues that must be addressed before merging: dead code masking a broken auth path, missing UID length validation, a missing request body size limit, a wrong HTTP status code, an unreachable error variant, and an underspecified test expectation.

Blocking

🔧 Dead code / broken auth

  • verify_api_key is dead code — auth uses plain string equality: PartnerStore::verify_api_key uses subtle::ConstantTimeEq and SHA-256 hashing. The actual auth path in batch_sync.rs hashes the token and calls find_by_api_key_hash, which then does a plain record.api_key_hash != hash string comparison (line 663). The constant-time method is never called. Either wire verify_api_key into the auth path or remove it; currently the timing-safe comparison exists only on paper. (crates/trusted-server-core/src/ec/partner.rs line 682, crates/trusted-server-core/src/ec/batch_sync.rs line 101)

  • No UID length validation in batch sync or pixel sync: SyncMapping.partner_uid is checked only for non-emptiness (line 206, batch_sync.rs). An unbounded UID written to KV can grow the entry beyond Fastly's limits or inflate x-ts-eids unexpectedly. A max length (e.g. 512 bytes) should be validated here and in sync_pixel.rs SyncQuery.uid. (crates/trusted-server-core/src/ec/batch_sync.rs line 206, crates/trusted-server-core/src/ec/sync_pixel.rs line 74)

  • No request body size limit in admin partner registration: handle_register_partner calls req.take_body_bytes() without a size cap (line 154). A malicious caller can send a very large payload; deserializing it amplifies memory pressure on the WASM instance. Apply a cap (e.g. 64 KiB) before deserialization. (crates/trusted-server-core/src/ec/admin.rs line 154)

🔧 Wrong HTTP status codes

  • PartnerNotFound maps to HTTP 400 instead of 404: IntoHttpResponse::status_code returns StatusCode::BAD_REQUEST for PartnerNotFound (line 114). A missing partner is a 404 Not Found, not a client input error. Returning 400 misleads callers into thinking the request was malformed. (crates/trusted-server-core/src/error.rs line 114)

  • PartnerAuthFailed variant is never instantiated: The variant exists in the error enum and maps to StatusCode::UNAUTHORIZED, but no call site in the codebase constructs it. Auth failures in batch_sync.rs return early with inline error_response(StatusCode::UNAUTHORIZED, ...) instead. Either wire the variant into the auth path or remove it so the #[allow(dead_code)] suppression on the whole enum does not hide this. (crates/trusted-server-core/src/error.rs line 83)

❓ Ambiguous test expectation

  • IdentifyConsentDenied accepts both 403 and 204 — which is correct?: The test at line 625 accepts either status with the comment "may be 403 or 204 depending on whether the EC context reads the cookie before consent check." This means the test cannot detect a regression where one of these cases starts returning the wrong code. The spec should declare the single correct response for this scenario under Unknown jurisdiction with GPC=1, and the test should assert exactly that code. (crates/integration-tests/tests/frameworks/scenarios.rs line 625)

Non-blocking

👍 Praise

  • Two-phase EC lifecycle design (read_from_request / generate_if_needed): EcContext cleanly separates the read phase (called on every request) from the write phase (called only in organic handlers). This prevents accidental EC generation on read-only endpoints like /identify and makes the lifecycle explicit and auditable. (crates/trusted-server-core/src/ec/mod.rs lines 147–275)

  • CAS retry loop with conflict-safe tombstone revival: KvIdentityGraph::create_or_revive handles the race between a delete and a re-consent within the tombstone window using if_generation_match with a bounded retry loop, re-reading the generation on each conflict and checking whether another writer already revived the entry. This is the correct way to handle optimistic concurrency on Fastly KV. (crates/trusted-server-core/src/ec/kv.rs lines 251–328)

  • O(1) auth lookup via hashed secondary index with stale-index guard: find_by_api_key_hash resolves apikey:{sha256_hex}partner_idPartnerRecord in two KV reads, then re-verifies record.api_key_hash == hash to guard against stale entries from key rotation. This is the right defense in depth for an eventually-consistent secondary index. (crates/trusted-server-core/src/ec/partner.rs lines 621–672)

  • Consent gating is applied consistently at every sync write: Both pixel sync (sync_pixel.rs line 62) and batch sync (via UpsertResult::ConsentWithdrawn from kv.rs line 474) check consent before writing to KV. Tombstone entries block upserts at the KV layer, so a late-arriving sync cannot repopulate partner IDs after withdrawal. The defense is applied at the correct architectural boundary.

  • KvMetadata sidecar avoids streaming the full KV body for consent checks: The ok and country fields are duplicated in the 2048-byte metadata slot so that future batch sync fast paths can check consent without deserializing the full entry body. The metadata_fits_in_2048_bytes test guards the size contract. (crates/trusted-server-core/src/ec/kv_types.rs lines 77–89, 278–291)

♻️ Refactor

  • encode_eids_header truncation is O(N²): The function iterates from N down to 0, re-serializing and base64-encoding the full slice on each attempt. For a modest N (e.g. 64 partners), this is 64 serialization passes. A faster approach: binary-search on size, or serialize incrementally and measure as you go. (crates/trusted-server-core/src/ec/eids.rs line 117)

  • upsert() write order leaves stale secondary index on interruption: The documented order writes the apikey: index before the primary record. If the process is interrupted between those two writes, the index points to a partner ID that has not yet been persisted. Swapping the order (primary first, then index) would leave a missing index entry (self-healing on next auth lookup) rather than a dangling one. (crates/trusted-server-core/src/ec/partner.rs lines 443–453)

  • list_registered() is O(N) KV reads per pull sync dispatch: Each organic request post-send enumerates all partner keys and reads each one individually. The _pull_enabled secondary index already exists to avoid this scan — dispatch_pull_sync should call pull_enabled_partners() rather than list_registered(). (crates/trusted-server-core/src/ec/partner.rs line 297)

🌱 Seedling

  • No cap on partner ID count per KV entry: Each pixel/batch/pull sync writes a new key into entry.ids. With no upper bound, a KV entry can grow unbounded (Fastly KV enforces a 1 MB value limit). Consider capping ids at a configurable maximum (e.g. 50 partners). (crates/trusted-server-core/src/ec/kv.rs line 400)

  • Client IP forwarded to partner pull-sync endpoints: PullSyncContext stores and re-exports the client IP for use in pull sync query parameters. Forwarding a user's IP to a third-party partner endpoint is a privacy/GDPR concern that should be documented and reviewed against the publisher's data processing agreement. (crates/trusted-server-core/src/ec/pull_sync.rs lines 25–42)

  • Tombstone does not preserve original created timestamp: KvEntry::tombstone resets created to now (the withdrawal timestamp). This loses audit information about when the identity was originally established, which may be needed for GDPR records of processing. The doc comment acknowledges the tradeoff but the decision should be revisited. (crates/trusted-server-core/src/ec/kv_types.rs line 152)

  • current_timestamp() silently returns epoch 0 on failure: If SystemTime::now() fails (unusual on wasm32-wasip1 but possible in certain test harnesses), the function returns 0. A timestamp of 0 passed to synced comparisons could cause all subsequent sync writes to appear stale. Consider logging a warning on the fallback path. (crates/trusted-server-core/src/ec/mod.rs line 425)

📝 Note

  • No Access-Control-Max-Age on /identify preflight responses: cors_preflight_identify sets Access-Control-Allow-Origin, Allow-Methods, Allow-Headers, and Vary but omits Access-Control-Max-Age. Without it, browsers use a very short default (5 seconds in Chrome), causing a preflight on every identify call from browser JS. Adding Access-Control-Max-Age: 600 would reduce preflight overhead significantly. (crates/trusted-server-core/src/ec/identify.rs line 209)

  • 7 missing integration test scenarios: The following EC scenarios have no integration test: pixel sync rate limiting, batch sync with ec_hash_not_found, batch sync with consent_withdrawn, batch sync size limit (> 1000 mappings), PartnerNotFound on pixel sync returns 400, admin registration validation errors, and pull sync dispatch. Each is a distinct failure mode that integration tests should cover.

⛏ Nitpick

  • normalize_ip IPv6 output format is undocumented as a stable contract: The function concatenates the first 4 segments as zero-padded hex without separators (e.g. "20010db885a30000"). If this format is stored in KV or logged for debugging, changing it would invalidate all existing EC hashes. Consider adding a doc comment explicitly marking the format as stable. (crates/trusted-server-core/src/ec/generation.rs line 27)

  • #[allow(dead_code)] on the entire error enum masks unused variants: The attribute on TrustedServerError (line 15) suppresses warnings for all variants, including PartnerAuthFailed which is genuinely unused. Apply the attribute per-variant only where needed, or remove it from the enum level so the compiler can surface real dead code. (crates/trusted-server-core/src/error.rs line 15)

📌 Out of scope

  • [synthetic][ec] config migration is a breaking deployment change: Any deployment that has trusted-server.toml with the old [synthetic] section will fail to start after this PR is deployed, unless the config is updated atomically. The migration should be documented in the PR description and coordinated with operators. This is a deployment concern, not a code defect.

CI Status

  • integration tests: PASS
  • browser integration tests: PASS
  • prepare integration artifacts: PASS

/// # Errors
///
/// Returns [`TrustedServerError::KvStore`] if the partner lookup fails.
pub fn verify_api_key(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔧 Dead code — verify_api_key is never called

This method uses subtle::ConstantTimeEq for timing-safe comparison, but the actual auth path in batch_sync.rs hashes the token and calls find_by_api_key_hash, which then does a plain record.api_key_hash != hash string comparison (line 663). The constant-time protection exists only on paper.

Fix: Either route batch sync auth through verify_api_key, or remove this method. If find_by_api_key_hash remains the auth path, apply constant-time comparison there:

use subtle::ConstantTimeEq;
if !record.api_key_hash.as_bytes().ct_eq(hash.as_bytes()).into() {
    return Ok(None);
}

continue;
}

if mapping.partner_uid.is_empty() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔧 No UID length validation

partner_uid is checked only for non-emptiness. An unbounded value written to the KV identity graph can grow the entry beyond Fastly's 1 MB limit and inflate x-ts-eids headers unpredictably.

Fix: Add a maximum length check:

const MAX_PARTNER_UID_BYTES: usize = 512;

if mapping.partner_uid.is_empty() || mapping.partner_uid.len() > MAX_PARTNER_UID_BYTES {
    errors.push(MappingError { index: idx, reason: REASON_INVALID_PARTNER_UID });
    continue;
}

The same cap should be applied in sync_pixel.rs.

mut req: Request,
) -> Result<Response, Report<TrustedServerError>> {
// Parse request body.
let body_bytes = req.take_body_bytes();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔧 No request body size limit

req.take_body_bytes() reads the entire body before deserialization. A malicious caller can send an arbitrarily large payload, amplifying memory pressure on the WASM instance.

Fix: Cap the body before deserializing:

const MAX_REGISTER_BODY_BYTES: usize = 64 * 1024; // 64 KiB

let body_bytes = req.take_body_bytes();
if body_bytes.len() > MAX_REGISTER_BODY_BYTES {
    return Err(bad_request("request body too large"));
}

Self::Proxy { .. } => StatusCode::BAD_GATEWAY,
Self::Ec { .. } => StatusCode::INTERNAL_SERVER_ERROR,
Self::EdgeCookie { .. } => StatusCode::INTERNAL_SERVER_ERROR,
Self::PartnerNotFound { .. } => StatusCode::BAD_REQUEST,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔧 PartnerNotFound should map to 404, not 400

A missing partner is a lookup failure, not a malformed request. Returning 400 misleads callers into thinking their request was syntactically invalid.

Fix:

Self::PartnerNotFound { .. } => StatusCode::NOT_FOUND,


/// Partner authentication failed (invalid or missing credentials).
#[display("Partner auth failed: {partner_id}")]
PartnerAuthFailed { partner_id: String },
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔧 PartnerAuthFailed variant is never instantiated

No call site constructs TrustedServerError::PartnerAuthFailed. Auth failures in batch_sync.rs return early with error_response(StatusCode::UNAUTHORIZED, ...) directly, bypassing this variant entirely.

Fix: Either route auth failures through this variant so the status-code mapping in IntoHttpResponse is actually used, or remove the variant. Also remove the enum-level #[allow(dead_code)] and apply it per-variant only where genuinely needed so the compiler can surface dead code.

///
/// Returns an error if JSON serialization fails.
pub fn encode_eids_header(eids: &[Eid]) -> Result<(String, bool), Report<TrustedServerError>> {
for size in (0..=eids.len()).rev() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ O(N²) truncation — re-serializes the full slice on every attempt

The loop iterates from N down to 0, calling serde_json::to_vec and BASE64.encode on each pass. For 64 partners this is 64 full serializations.

For the expected partner counts this is unlikely to be a hot-path bottleneck, but worth a comment noting the complexity. Binary search on size would reduce to O(N log N) if this ever becomes a concern.

std::time::SystemTime::now()
.duration_since(std::time::UNIX_EPOCH)
.map(|d| d.as_secs())
.unwrap_or(0)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🌱 current_timestamp() silently returns epoch 0 on failure

If SystemTime::now() returns an error, the function returns 0. A synced timestamp of 0 would cause all subsequent sync writes to appear stale, silently breaking the identity graph without any log output.

Suggestion: Log a warning on the fallback path:

.unwrap_or_else(|_| {
    log::warn!("SystemTime::now() failed — using epoch 0 as timestamp fallback");
    0
})

/// preserved — reading the existing entry first would add latency on
/// the consent-withdrawal hot path, and the tombstone expires in 24h.
#[must_use]
pub fn tombstone(now: u64) -> Self {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🌱 Tombstone does not preserve the original created timestamp

The doc comment acknowledges this tradeoff. If GDPR records of processing require knowing when an identity was originally established, this information is lost when the tombstone overwrites created. Consider reading the existing entry before writing the tombstone and preserving created when available.

CorsDecision::Denied
}

fn apply_cors_headers(response: &mut Response, origin: &str) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📝 Missing Access-Control-Max-Age on preflight responses

Without this header, browsers use a very short default preflight cache lifetime (5 seconds in Chrome), causing a preflight on every identify call from browser JS.

Suggestion: Add to apply_cors_headers:

response.set_header("access-control-max-age", "600");

/// where devices rotate their interface identifier (lower 64 bits).
/// The first 4 segments are hex-encoded without separators.
/// IPv4 addresses are returned unchanged.
fn normalize_ip(ip: IpAddr) -> String {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

normalize_ip IPv6 output format is undocumented as a stable contract

The format (4 segments concatenated as zero-padded hex without separators, e.g. "20010db885a30000") is used as HMAC input and therefore determines the EC hash. Changing the format would invalidate all existing EC cookies. Consider adding an explicit stability note to the doc comment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants