diff --git a/NLSR_PATCH_7818_INTEGRATION.md b/NLSR_PATCH_7818_INTEGRATION.md new file mode 100644 index 0000000..499049a --- /dev/null +++ b/NLSR_PATCH_7818_INTEGRATION.md @@ -0,0 +1,253 @@ +# NLSR Patch 7818 Integration Guide + +## Overview + +This guide documents the changes required to integrate NLSR Gerrit Change 7818 ("lsdb: recover lost LSA seqNo from network via sync") into the Named Data testbed. + +**Change 7818:** https://gerrit.named-data.net/c/NLSR/+/7818 + +**Problem Solved (Issue #5386):** When an NLSR router's sequence number file is corrupted or lost, the router would restart with sequence number 0 and other routers would ignore its LSAs because they expected higher sequence numbers. Patch 7818 enables automatic recovery of the sequence number from the network via sync protocol. + +--- + +## Changes Required + +### 1. NLSR Dockerfile (`nlsr.Dockerfile`) + +The Dockerfile builds NLSR with patch 7818 applied using Named Data's base images. + +**Key changes:** + +- Use `ghcr.io/named-data/ndn-cxx-build` and `ghcr.io/named-data/ndn-cxx-runtime` as base images (not Ubuntu) - this ensures proper library paths and dependencies +- Set `ENV HOME=/config` in the runtime stage - required for ndn-cxx to find the identity database (PIB) +- Build psync from source (not available in base images) +- Use patch revision 6 (latest): `refs/changes/18/7818/6` +- Configure NLSR with `--with-psync` to find psync +- Install `libboost-iostreams1.83.0` for runtime + +```dockerfile +# Critical environment variable - ndn-cxx looks for PIB at $HOME/.ndn/ +ENV HOME=/config +``` + +### 2. Docker Compose Override (`docker-compose.override.yml`) + +Override to use the patched NLSR image: + +```yaml +services: + nlsr: + image: ghcr.io/a-thieme/nlsr:patch-7818-v3 +``` + +### 3. NLSR Config Template (`templates/nlsr/nlsr.conf.j2`) + +**Critical fix:** The advertising section format changed in the patched NLSR. + +**Old format (rejected by patch 7818):** +```jinja +advertising +{ + {% for prefix in advertised_prefixes %} + prefix {{ prefix }} + {% endfor %} +} +``` + +**New format (required by patch 7818):** +```jinja +advertising +{ + {% for prefix in advertised_prefixes %} + {{ prefix }} 1 + {% endfor %} +} +``` + +The format changed from `prefix ` to ` ` (key-value format). + +--- + +## Building the Patched Image + +```bash +# Build the image +docker build -t ghcr.io//nlsr:patch-7818 -f nlsr.Dockerfile . + +# Push to registry +docker push ghcr.io//nlsr:patch-7818 +``` + +## Deploying on Testbed Nodes + +On each testbed node: + +```bash +# Pull or update the image +docker-compose pull + +# Restart NLSR +docker-compose up -d nlsr +``` + +--- + +## Verification + +### Check NLSR Version +```bash +docker exec testbed-nlsr-1 nlsr -V +# Expected: 24.08+git.7e80068b (or similar with 7e80068 commit) +``` + +### Check Identity Loading +```bash +docker logs testbed-nlsr-1 2>&1 | grep identity +# Expected: Should show proper router identity, NOT "not found" +``` + +### Verify Sequence Recovery Works + +1. Corrupt the sequence file: +```bash +docker stop testbed-nlsr-1 +echo -e "AdjLsaSeqNo: 5\nNameLsaSeqNo: 5" > /path/to/state/nlsrSeqNo.txt +docker start testbed-nlsr-1 +``` + +2. Check recovery: +```bash +docker logs testbed-nlsr-1 2>&1 | grep "Received sync update for own router" +# Or check sequence numbers: +docker exec testbed-nlsr-1 cat /config/state/nlsrSeqNo.txt +# Should show higher values than 5 (recovered from network) +``` + +### Check Neighbor Connectivity + +From a neighbor node (e.g., Singapore): +```bash +docker exec testbed-nlsr-1 nlsrc lsdb | grep -A5 osaka +# Should show current sequence numbers (not stale) +``` + +--- + +## Troubleshooting + +### "Router identity not found" / "NLSR is running without security" + +**Cause:** The `HOME=/config` environment variable is missing. + +**Fix:** Ensure the Dockerfile includes `ENV HOME=/config` + +### "Invalid cost format; only integers are allowed" + +**Cause:** Advertising section uses old format `prefix ` instead of ` ` + +**Fix:** Update `templates/nlsr/nlsr.conf.j2` to use `{{ prefix }} 1` format + +### Sequence numbers stuck or not increasing + +**Possible causes:** +1. Security not working (check identity logs) +2. Firewall blocking UDP port 6363 +3. Network connectivity issues between nodes + +--- + +## Image Tags + +| Tag | Description | +|-----|-------------| +| `patch-7818-v1` | Initial build, wrong advertising format | +| `patch-7818-v2` | Fixed advertising format, missing HOME=/config | +| `patch-7818-v3` | **Current working version** - all fixes applied | + +--- + +--- + +## After Patch Merge: "Just Works" Deployment (Option B) + +This section describes the **minimal changes** needed in named-data/testbed once patch 7818 is merged into NLSR, assuming **no backward compatibility** (Option B). + +### Assumption + +Patch 7818 is merged into NLSR master branch. A future NLSR release/tag includes the patch. The testbed simply needs to use that version. + +### Required Changes + +#### 1. named-data/testbed + +**Single required change:** + +Update `templates/nlsr/nlsr.conf.j2` - advertising section format: + +```jinja +advertising +{ + {% for prefix in advertised_prefixes %} + {{ prefix }} 1 + {% endfor %} +} +``` + +That's it. **One line change.** + +Everything else works automatically because: +- Named Data's CI builds NLSR Docker images with the merged patch +- Testbed's docker-compose.yml already references `ghcr.io/named-data/nlsr:latest` (or a tagged version) +- The NLSR Docker image includes `ENV HOME=/config` (already in official Dockerfile) + +#### 2. NLSR Gerrit Patch (7818) + +**No changes needed.** The patch works as-is. The breaking config format change is intentional and documented. + +### How It Works After Merge + +``` +named-data/testbed: +├── docker-compose.yml ──► References NLSR image (auto-includes patch) +├── templates/nlsr/nlsr.conf.j2 ──► Generates format ✓ +└── Dockerfile ──► Uses official NLSR image (already has HOME=/config) +``` + +### Rolling Update Process + +When patch 7818 is merged and deployed: + +```bash +# 1. Pull latest NLSR image (includes merged patch) +docker-compose pull nlsr + +# 2. Config auto-regenerates via master container cron job, or manually: +python3 framework/main.py --dry + +# 3. Restart NLSR on each node +docker-compose up -d nlsr +``` + +### What NOT to Change + +The following files do **not** need modification: + +- `nlsr.Dockerfile` - Only needed for custom builds before patch is merged +- `docker-compose.override.yml` - Per-deployer override, not in upstream + +### Summary + +| Repository | Change Required | When | +|------------|----------------|------| +| named-data/testbed | Update `templates/nlsr/nlsr.conf.j2` | One-time after patch merge | +| NLSR (gerrit 7818) | None | Already works | + +**Result:** After the single template update, deploying new NLSR versions "just works." + +--- + +## Related Files + +- `nlsr.Dockerfile` - Multi-stage build for patched NLSR (custom, not needed after merge) +- `docker-compose.override.yml` - Override to use patched image (per-deployer, not upstream) +- `templates/nlsr/nlsr.conf.j2` - NLSR configuration template (the only file needing update) diff --git a/docker-compose.override.yml b/docker-compose.override.yml new file mode 100644 index 0000000..6f3225e --- /dev/null +++ b/docker-compose.override.yml @@ -0,0 +1,30 @@ +# docker-compose.override.yml - Override nlsr service for local build +# +# Docker Compose automatically reads this file when running `docker-compose up`. +# See: https://docs.docker.com/compose/compose-file/13-override/ +# +# Workflow to build and deploy patched NLSR: +# 1. On local machine with Docker and ghcr.io access: +# docker build -t ghcr.io/a-thieme/nlsr:patch-7818 -f nlsr.Dockerfile . +# docker push ghcr.io/a-thieme/nlsr:patch-7818 +# +# 2. Update NLSR_IMAGE below to your registry image +# +# 3. On remote machine: +# docker-compose pull +# docker-compose up -d nlsr + +name: testbed +services: + nlsr: + # Build from local Dockerfile (for local testing only) + # Comment out build: to pull pre-built image instead + # build: + # context: . + # dockerfile: nlsr.Dockerfile + + # Image from your registry (push built image to ghcr.io first) + image: ghcr.io/a-thieme/nlsr:patch-7818-v3 + + # To use pre-built Named Data image instead: + # image: ghcr.io/a-thieme/nlsr:patch-7818-v3 diff --git a/nlsr.Dockerfile b/nlsr.Dockerfile new file mode 100644 index 0000000..343f1f2 --- /dev/null +++ b/nlsr.Dockerfile @@ -0,0 +1,63 @@ +# nlsr.Dockerfile - Build NLSR with Gerrit patch 7818 applied +# Patch 7818: Recover lost LSA seqNo from network via sync +# +# Uses Named Data's base images for proper library paths +# Build and push to your own ghcr.io registry: +# docker build -t ghcr.io//nlsr:patch-7818 -f nlsr.Dockerfile . +# docker push ghcr.io//nlsr:patch-7818 + +# Stage 1: Build using Named Data's build environment +FROM ghcr.io/named-data/ndn-cxx-build AS builder + +ENV DEBIAN_FRONTEND=noninteractive + +# Ensure SSL certs are available for git and install boost iostreams +RUN apt-get update && apt-get install -y ca-certificates libboost-iostreams-dev && rm -rf /var/lib/apt/lists/* + +# Build psync from source (psync is not in the base image) +WORKDIR /build +RUN git clone --depth 1 --branch master https://github.com/named-data/psync.git psync +WORKDIR /build/psync +RUN ./waf configure --prefix=/usr --libdir=/usr/lib && \ + ./waf build -j$(nproc) && ./waf install + +# Clone NLSR and apply patch +WORKDIR /build +RUN git clone --depth 1 --branch master https://github.com/named-data/NLSR.git nlsr +WORKDIR /build/nlsr + +# Fetch patch 7818 from Gerrit +# Use latest revision (change ref if needed: /1, /2, ..., /6) +ARG NLSR_PATCH_REF=refs/changes/18/7818/6 +RUN git fetch https://gerrit.named-data.net/NLSR.git ${NLSR_PATCH_REF} && \ + git checkout FETCH_HEAD + +# Build NLSR using waf with --with-psync to find psync +RUN ./waf configure --prefix=/usr --libdir=/usr/lib --with-psync && \ + ./waf build -j$(nproc) && ./waf install + +# Stage 2: Runtime image using Named Data's runtime base +FROM ghcr.io/named-data/ndn-cxx-runtime + +ENV DEBIAN_FRONTEND=noninteractive +ENV HOME=/config + +# Runtime dependencies for NLSR +RUN apt-get update && apt-get install -y \ + libsqlite3-0 \ + libpcap0.8 \ + libboost-iostreams1.83.0 \ + && rm -rf /var/lib/apt/lists/* \ + && apt-get clean + +# Copy built NLSR and psync artifacts from builder +COPY --from=builder /usr/bin/nlsr /usr/bin/nlsr +COPY --from=builder /usr/bin/nlsrc /usr/bin/nlsrc +COPY --from=builder /usr/lib/libPSync.so* /usr/lib/ + +# Default entrypoint matches original image +ENTRYPOINT ["/usr/bin/nlsr"] +CMD ["-f", "/config/nlsr.conf"] + +LABEL org.opencontainers.image.source=https://github.com/named-data/NLSR +LABEL org.opencontainers.image.description="NLSR with patch 7818 (LSA seqNo recovery via sync)" \ No newline at end of file diff --git a/templates/nlsr/nlsr.conf.j2 b/templates/nlsr/nlsr.conf.j2 index d739d7f..80798f4 100644 --- a/templates/nlsr/nlsr.conf.j2 +++ b/templates/nlsr/nlsr.conf.j2 @@ -54,7 +54,7 @@ fib advertising { {% for prefix in advertised_prefixes %} - prefix {{ prefix }} + {{ prefix }} 1 {% endfor %} }