Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
253 changes: 253 additions & 0 deletions NLSR_PATCH_7818_INTEGRATION.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,253 @@
# NLSR Patch 7818 Integration Guide

## Overview

This guide documents the changes required to integrate NLSR Gerrit Change 7818 ("lsdb: recover lost LSA seqNo from network via sync") into the Named Data testbed.

**Change 7818:** https://gerrit.named-data.net/c/NLSR/+/7818

**Problem Solved (Issue #5386):** When an NLSR router's sequence number file is corrupted or lost, the router would restart with sequence number 0 and other routers would ignore its LSAs because they expected higher sequence numbers. Patch 7818 enables automatic recovery of the sequence number from the network via sync protocol.

---

## Changes Required

### 1. NLSR Dockerfile (`nlsr.Dockerfile`)

The Dockerfile builds NLSR with patch 7818 applied using Named Data's base images.

**Key changes:**

- Use `ghcr.io/named-data/ndn-cxx-build` and `ghcr.io/named-data/ndn-cxx-runtime` as base images (not Ubuntu) - this ensures proper library paths and dependencies
- Set `ENV HOME=/config` in the runtime stage - required for ndn-cxx to find the identity database (PIB)
- Build psync from source (not available in base images)
- Use patch revision 6 (latest): `refs/changes/18/7818/6`
- Configure NLSR with `--with-psync` to find psync
- Install `libboost-iostreams1.83.0` for runtime

```dockerfile
# Critical environment variable - ndn-cxx looks for PIB at $HOME/.ndn/
ENV HOME=/config
```

### 2. Docker Compose Override (`docker-compose.override.yml`)

Override to use the patched NLSR image:

```yaml
services:
nlsr:
image: ghcr.io/a-thieme/nlsr:patch-7818-v3
```

### 3. NLSR Config Template (`templates/nlsr/nlsr.conf.j2`)

**Critical fix:** The advertising section format changed in the patched NLSR.

**Old format (rejected by patch 7818):**
```jinja
advertising
{
{% for prefix in advertised_prefixes %}
prefix {{ prefix }}
{% endfor %}
}
```

**New format (required by patch 7818):**
```jinja
advertising
{
{% for prefix in advertised_prefixes %}
{{ prefix }} 1
{% endfor %}
}
```

The format changed from `prefix <name>` to `<name> <cost>` (key-value format).

---

## Building the Patched Image

```bash
# Build the image
docker build -t ghcr.io/<username>/nlsr:patch-7818 -f nlsr.Dockerfile .

# Push to registry
docker push ghcr.io/<username>/nlsr:patch-7818
```

## Deploying on Testbed Nodes

On each testbed node:

```bash
# Pull or update the image
docker-compose pull

# Restart NLSR
docker-compose up -d nlsr
```

---

## Verification

### Check NLSR Version
```bash
docker exec testbed-nlsr-1 nlsr -V
# Expected: 24.08+git.7e80068b (or similar with 7e80068 commit)
```

### Check Identity Loading
```bash
docker logs testbed-nlsr-1 2>&1 | grep identity
# Expected: Should show proper router identity, NOT "not found"
```

### Verify Sequence Recovery Works

1. Corrupt the sequence file:
```bash
docker stop testbed-nlsr-1
echo -e "AdjLsaSeqNo: 5\nNameLsaSeqNo: 5" > /path/to/state/nlsrSeqNo.txt
docker start testbed-nlsr-1
```

2. Check recovery:
```bash
docker logs testbed-nlsr-1 2>&1 | grep "Received sync update for own router"
# Or check sequence numbers:
docker exec testbed-nlsr-1 cat /config/state/nlsrSeqNo.txt
# Should show higher values than 5 (recovered from network)
```

### Check Neighbor Connectivity

From a neighbor node (e.g., Singapore):
```bash
docker exec testbed-nlsr-1 nlsrc lsdb | grep -A5 osaka
# Should show current sequence numbers (not stale)
```

---

## Troubleshooting

### "Router identity not found" / "NLSR is running without security"

**Cause:** The `HOME=/config` environment variable is missing.

**Fix:** Ensure the Dockerfile includes `ENV HOME=/config`

### "Invalid cost format; only integers are allowed"

**Cause:** Advertising section uses old format `prefix <name>` instead of `<name> <cost>`

**Fix:** Update `templates/nlsr/nlsr.conf.j2` to use `{{ prefix }} 1` format

### Sequence numbers stuck or not increasing

**Possible causes:**
1. Security not working (check identity logs)
2. Firewall blocking UDP port 6363
3. Network connectivity issues between nodes

---

## Image Tags

| Tag | Description |
|-----|-------------|
| `patch-7818-v1` | Initial build, wrong advertising format |
| `patch-7818-v2` | Fixed advertising format, missing HOME=/config |
| `patch-7818-v3` | **Current working version** - all fixes applied |

---

---

## After Patch Merge: "Just Works" Deployment (Option B)

This section describes the **minimal changes** needed in named-data/testbed once patch 7818 is merged into NLSR, assuming **no backward compatibility** (Option B).

### Assumption

Patch 7818 is merged into NLSR master branch. A future NLSR release/tag includes the patch. The testbed simply needs to use that version.

### Required Changes

#### 1. named-data/testbed

**Single required change:**

Update `templates/nlsr/nlsr.conf.j2` - advertising section format:

```jinja
advertising
{
{% for prefix in advertised_prefixes %}
{{ prefix }} 1
{% endfor %}
}
```

That's it. **One line change.**

Everything else works automatically because:
- Named Data's CI builds NLSR Docker images with the merged patch
- Testbed's docker-compose.yml already references `ghcr.io/named-data/nlsr:latest` (or a tagged version)
- The NLSR Docker image includes `ENV HOME=/config` (already in official Dockerfile)

#### 2. NLSR Gerrit Patch (7818)

**No changes needed.** The patch works as-is. The breaking config format change is intentional and documented.

### How It Works After Merge

```
named-data/testbed:
├── docker-compose.yml ──► References NLSR image (auto-includes patch)
├── templates/nlsr/nlsr.conf.j2 ──► Generates <prefix> <cost> format ✓
└── Dockerfile ──► Uses official NLSR image (already has HOME=/config)
```

### Rolling Update Process

When patch 7818 is merged and deployed:

```bash
# 1. Pull latest NLSR image (includes merged patch)
docker-compose pull nlsr

# 2. Config auto-regenerates via master container cron job, or manually:
python3 framework/main.py --dry

# 3. Restart NLSR on each node
docker-compose up -d nlsr
```

### What NOT to Change

The following files do **not** need modification:

- `nlsr.Dockerfile` - Only needed for custom builds before patch is merged
- `docker-compose.override.yml` - Per-deployer override, not in upstream

### Summary

| Repository | Change Required | When |
|------------|----------------|------|
| named-data/testbed | Update `templates/nlsr/nlsr.conf.j2` | One-time after patch merge |
| NLSR (gerrit 7818) | None | Already works |

**Result:** After the single template update, deploying new NLSR versions "just works."

---

## Related Files

- `nlsr.Dockerfile` - Multi-stage build for patched NLSR (custom, not needed after merge)
- `docker-compose.override.yml` - Override to use patched image (per-deployer, not upstream)
- `templates/nlsr/nlsr.conf.j2` - NLSR configuration template (the only file needing update)
30 changes: 30 additions & 0 deletions docker-compose.override.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
# docker-compose.override.yml - Override nlsr service for local build
#
# Docker Compose automatically reads this file when running `docker-compose up`.
# See: https://docs.docker.com/compose/compose-file/13-override/
#
# Workflow to build and deploy patched NLSR:
# 1. On local machine with Docker and ghcr.io access:
# docker build -t ghcr.io/a-thieme/nlsr:patch-7818 -f nlsr.Dockerfile .
# docker push ghcr.io/a-thieme/nlsr:patch-7818
#
# 2. Update NLSR_IMAGE below to your registry image
#
# 3. On remote machine:
# docker-compose pull
# docker-compose up -d nlsr

name: testbed
services:
nlsr:
# Build from local Dockerfile (for local testing only)
# Comment out build: to pull pre-built image instead
# build:
# context: .
# dockerfile: nlsr.Dockerfile

# Image from your registry (push built image to ghcr.io first)
image: ghcr.io/a-thieme/nlsr:patch-7818-v3

# To use pre-built Named Data image instead:
# image: ghcr.io/a-thieme/nlsr:patch-7818-v3
63 changes: 63 additions & 0 deletions nlsr.Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
# nlsr.Dockerfile - Build NLSR with Gerrit patch 7818 applied
# Patch 7818: Recover lost LSA seqNo from network via sync
#
# Uses Named Data's base images for proper library paths
# Build and push to your own ghcr.io registry:
# docker build -t ghcr.io/<username>/nlsr:patch-7818 -f nlsr.Dockerfile .
# docker push ghcr.io/<username>/nlsr:patch-7818

# Stage 1: Build using Named Data's build environment
FROM ghcr.io/named-data/ndn-cxx-build AS builder

ENV DEBIAN_FRONTEND=noninteractive

# Ensure SSL certs are available for git and install boost iostreams
RUN apt-get update && apt-get install -y ca-certificates libboost-iostreams-dev && rm -rf /var/lib/apt/lists/*

# Build psync from source (psync is not in the base image)
WORKDIR /build
RUN git clone --depth 1 --branch master https://github.com/named-data/psync.git psync
WORKDIR /build/psync
RUN ./waf configure --prefix=/usr --libdir=/usr/lib && \
./waf build -j$(nproc) && ./waf install

# Clone NLSR and apply patch
WORKDIR /build
RUN git clone --depth 1 --branch master https://github.com/named-data/NLSR.git nlsr
WORKDIR /build/nlsr

# Fetch patch 7818 from Gerrit
# Use latest revision (change ref if needed: /1, /2, ..., /6)
ARG NLSR_PATCH_REF=refs/changes/18/7818/6
RUN git fetch https://gerrit.named-data.net/NLSR.git ${NLSR_PATCH_REF} && \
git checkout FETCH_HEAD

# Build NLSR using waf with --with-psync to find psync
RUN ./waf configure --prefix=/usr --libdir=/usr/lib --with-psync && \
./waf build -j$(nproc) && ./waf install

# Stage 2: Runtime image using Named Data's runtime base
FROM ghcr.io/named-data/ndn-cxx-runtime

ENV DEBIAN_FRONTEND=noninteractive
ENV HOME=/config

# Runtime dependencies for NLSR
RUN apt-get update && apt-get install -y \
libsqlite3-0 \
libpcap0.8 \
libboost-iostreams1.83.0 \
&& rm -rf /var/lib/apt/lists/* \
&& apt-get clean

# Copy built NLSR and psync artifacts from builder
COPY --from=builder /usr/bin/nlsr /usr/bin/nlsr
COPY --from=builder /usr/bin/nlsrc /usr/bin/nlsrc
COPY --from=builder /usr/lib/libPSync.so* /usr/lib/

# Default entrypoint matches original image
ENTRYPOINT ["/usr/bin/nlsr"]
CMD ["-f", "/config/nlsr.conf"]

LABEL org.opencontainers.image.source=https://github.com/named-data/NLSR
LABEL org.opencontainers.image.description="NLSR with patch 7818 (LSA seqNo recovery via sync)"
2 changes: 1 addition & 1 deletion templates/nlsr/nlsr.conf.j2
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,7 @@ fib
advertising
{
{% for prefix in advertised_prefixes %}
prefix {{ prefix }}
{{ prefix }} 1
{% endfor %}
}

Expand Down