Skip to content

feat: Deep Robotics M20 autonomous navigation support#1768

Draft
aphexcx wants to merge 1609 commits intodimensionalOS:devfrom
aphexcx:feat/deeprobotics-m20-nav
Draft

feat: Deep Robotics M20 autonomous navigation support#1768
aphexcx wants to merge 1609 commits intodimensionalOS:devfrom
aphexcx:feat/deeprobotics-m20-nav

Conversation

@aphexcx
Copy link
Copy Markdown

@aphexcx aphexcx commented Apr 10, 2026

Summary

Adds full Deep Robotics M20 quadruped support to dimos, including autonomous navigation via ARISE SLAM running in a Docker container on the robot's NOS board.

What's included

  • M20 Connection Module: RTSP camera, UDP velocity control, dead-reckoning + ROS2 odometry, drdds /NAV_CMD for navigation mode
  • ARISE SLAM Integration: Docker-based deployment with drdds SHM bridge (lidar + IMU), patched Velodyne processing pipeline (Findings no.8, no.11 in ARISE log), lifecycle timer fixes for ARM64 (Finding no.7)
  • SmartNav Blueprint: Container runs C++ nav stack (ARISE SLAM + TerrainAnalysis + LocalPlanner + PathFollower), host runs Python modules (PGO, CmdVelMux, ClickToGoal, RerunBridge)
  • Rerun Visualization: 3D point cloud + 2D camera panel with color_image streaming, smart_nav visual overrides for terrain/costmap/trajectory
  • Deploy Tooling: deploy.sh for remote M20 management (push/pull Docker images, bridge start/stop, service conflict resolution, NAT setup) with Tailscale support
  • Native /NAV_CMD Publisher: pybind11 module using host drdds (bypasses rclpy glibc incompatibility on Ubuntu 20.04 NOS)
  • drdds SHM Bridge: drdds_recv (host) + ros2_pub (container) for lidar point cloud + IMU data
  • M20 Official Dev Guide: Included as reference documentation

Architecture

M20 Hardware Topology

The M20 has three compute boards connected via internal Ethernet (10.21.31.0/24). dimos runs on NOS.

graph TB
    subgraph M20["Deep Robotics M20"]
        subgraph AOS["AOS — Main Controller<br/>10.21.31.103"]
            A1[Motor control]
            A2[lio_perception]
            A3["rsdriver (lidar)"]
            A4["yesense (IMU)"]
            A5[RTSP camera]
            A6["WiFi AP (10.21.41.1)"]
        end
        subgraph GOS["GOS — Comms<br/>10.21.31.104"]
            G1[5G modem]
            G2[Tailscale VPN]
            G3[NAT gateway]
        end
        subgraph NOS["NOS — Compute<br/>10.21.31.106"]
            N1[Docker host]
            N2[dimos runtime]
            N3[drdds_recv]
            N4["Python 3.12 · 16GB RAM · 62GB eMMC"]
        end
        subgraph Sensors["Sensors"]
            S1["2x RoboSense RSAIRY 192-ch lidar<br/>front: 224.10.10.201 · back: 224.10.10.202"]
            S2["Yesense IMU (/IMU_YESENSE)"]
            S3["RTSP Camera (rtsp://AOS:8554/video1)"]
        end
        AOS --- ETH["Internal Ethernet<br/>10.21.31.0/24"]
        GOS --- ETH
        NOS --- ETH
    end
Loading

Software Stack (NOS)

C++ navigation modules run inside a Docker container (ROS2 Humble, Ubuntu 22.04), while Python modules run on the NOS host (Ubuntu 20.04, Python 3.12). This split is necessary because:

  • ARISE SLAM, TerrainAnalysis, LocalPlanner, PathFollower are C++ requiring ROS2 Humble (glibc 2.32+)
  • NOS runs Ubuntu 20.04 (glibc 2.31) -- can't run Humble natively
  • dimos Python modules (PGO, CmdVelMux, ClickToGoal) don't need ROS2
  • The container talks to dimos via RPC (DockerModuleProxy)
graph TB
    subgraph NOS["NOS (10.21.31.106)"]
        subgraph Host["Host — Ubuntu 20.04, Python 3.12 via uv"]
            subgraph Coordinator["dimos ModuleCoordinator (m20_smartnav)"]
                MC["M20Connection<br/>· RTSP camera<br/>· /NAV_CMD (drdds)"]
                PGO["PGO<br/>· loop closure<br/>· iSAM2"]
                CMX["CmdVelMux<br/>· teleop + nav<br/>· velocity mux"]
                CTG["ClickToGoal<br/>· click-to-nav"]
                RR["RerunBridgeModule<br/>gRPC :9877"]
                WS["WebsocketVisModule<br/>Command Center :7779"]
            end
        end
        subgraph Docker["Docker Container 'dimos-nav' — ROS2 Humble, Ubuntu 22.04"]
            R2P["ros2_pub<br/>(SHM → ROS2 bridge)"]
            ARISE["ARISE SLAM<br/>· feature_extraction<br/>· laser_mapping<br/>· imu_preintegration"]
            TA["Terrain<br/>Analysis"]
            LP["Local Planner +<br/>Path Follower"]
            R2P --> ARISE --> TA --> LP
        end
        subgraph Bridge["drdds_recv — host process, root"]
            DR["Subscribes /LIDAR/POINTS + /IMU via drdds<br/>Writes POSIX SHM to /dev/shm"]
        end
        Host -- "RPC (DockerModuleProxy)" --> Docker
        DR -- "/dev/shm (--ipc host)" --> R2P
    end
Loading

Sensor Data Flow

End-to-end path from hardware sensors through ARISE SLAM to motor commands:

flowchart TB
    subgraph AOS["AOS"]
        RS["rsdriver<br/>/LIDAR/POINTS"]
        YS["yesense<br/>/IMU_YESENSE"]
    end

    subgraph NOS_Host["NOS Host"]
        RECV["drdds_recv<br/>(POSIX SHM)"]
    end

    subgraph Container["Docker Container"]
        PUB["ros2_pub"]
        FE["feature_extraction"]
        LM["laser_mapping"]
        IP["imu_preintegration"]
        TA2["TerrainAnalysis"]
        LP2["LocalPlanner"]
        PF["PathFollower"]
        RNAV["ROSNav Module<br/>(ROS2 → LCM bridge)"]
    end

    subgraph NOS_Modules["NOS Host Modules"]
        PGO2["PGO<br/>loop closure + global map"]
        CMX2["CmdVelMux<br/>teleop + nav merge"]
        M20C["M20Connection"]
        NCP["nav_cmd_pub<br/>(pybind11)"]
    end

    RS -- "drdds SHM" --> RECV
    YS -- "drdds SHM" --> RECV
    RECV -- "/dev/shm" --> PUB
    PUB -- "/bridge/LIDAR_POINTS" --> FE
    PUB -- "/bridge/IMU" --> IP
    FE --> LM
    IP --> LM
    LM -- "/state_estimation" --> RNAV
    LM -- "/registered_scan" --> RNAV
    LM --> TA2 --> LP2 --> PF
    PF -- "cmd_vel" --> RNAV
    RNAV -- "LCM /odometry" --> PGO2
    RNAV -- "LCM /registered_scan" --> PGO2
    RNAV -- "LCM /nav_cmd_vel" --> CMX2
    CMX2 -- "LCM /cmd_vel" --> M20C
    M20C --> NCP
    NCP -- "drdds /NAV_CMD" --> AOS_Motor["AOS<br/>Motor Controller"]
Loading

Startup Ordering (Critical)

drdds uses FastDDS Shared Memory transport for zero-copy IPC. The SHM discovery mechanism requires subscribers to start before publishers, otherwise the subscriber never discovers the publisher's SHM segments.

flowchart LR
    subgraph Order["Startup Sequence"]
        direction TB
        S1["1. drdds_recv starts<br/>Creates SHM segments"]
        S2["2. rsdriver restarts<br/>Discovers SHM, publishes lidar"]
        S3["3. yesense restarts<br/>Discovers SHM, publishes IMU"]
        S4["4. Docker container starts<br/>ros2_pub reads /dev/shm"]
        S5["5. ARISE SLAM activates<br/>Lifecycle: 5/10/15s timers"]
        S6["6. IMU init completes<br/>~200 samples for gravity cal"]
        S7["7. SLAM producing<br/>/state_estimation + /registered_scan"]
        S1 --> S2 --> S3 --> S4 --> S5 --> S6 --> S7
    end
Loading

If this ordering is violated (e.g., rsdriver starts before drdds_recv), the SHM bridge silently fails -- lidar data never reaches ARISE SLAM. The deploy.sh bridge-start command enforces this ordering.

Remote Access (Tailscale)

flowchart LR
    subgraph Mac["Developer Mac"]
        DV["dimos-viewer"]
        T1["localhost:9877"]
        T2["localhost:7779"]
        T3["localhost:3030"]
    end
    subgraph GOS_VPN["GOS (Tailscale)<br/>100.89.134.73"]
        TS["5G + VPN<br/>NAT gateway"]
    end
    subgraph NOS_Ports["NOS"]
        P1[":9877 Rerun gRPC"]
        P2[":7779 Command Center"]
        P3[":3030 RerunWebSocket"]
    end
    T1 -- "SSH tunnel" --> TS
    T2 -- "SSH tunnel" --> TS
    T3 -- "SSH tunnel" --> TS
    TS -- NAT --> P1
    TS -- NAT --> P2
    TS -- NAT --> P3
    DV -- "--connect" --> T1
    DV -- "--ws-url" --> T3
Loading

Included documentation

This PR includes extensive investigation logs and planning documents that serve as critical context for understanding the M20 platform and the architectural decisions made.

Investigation logs

Document Description
INVESTIGATION_LOG.md 52 sections covering the initial M20 integration -- LiDAR data flow, mac_bridge TCP transport, drdds/DDS topic discovery, CycloneDDS vs FastDDS interop, NOS Docker setup, RTSP camera, dead-reckoning odometry, and getting dimos running on Mac receiving all M20 topics. The foundational investigation that preceded the ROSNav migration.
plans/m20-rosnav-migration/ROSNAV_MIGRATION_LOG.md 30 findings documenting the full migration from the M20's proprietary drdds nav stack to dimos ROSNav. Covers DDS discovery (Finding no.1), NOS resource constraints (Finding no.5), Docker networking (Finding no.9), FastDDS SHM cross-version compatibility (Finding no.16), lidar multicast topology (Finding no.22), drdds SHM bridge design (Finding no.25), and more. Essential reading for anyone working on M20 or similar embedded ARM64 robots with proprietary middleware.
plans/m20-rosnav-migration/ARISE_SLAM_LOG.md 17 findings on integrating ARISE SLAM's Velodyne mode with the M20's dual RoboSense RSAIRY 192-channel lidars. Key discoveries: ARISE's Velodyne processing pipeline was never functional upstream -- undistortionAndscanregistration() is commented out in all forks (Finding no.8). Documents the IMU-triggered processing fix from SuperOdom (Finding no.11), 192-to-64 channel ring remapping (Finding no.2), QoS mismatch (Finding no.3), float32 time truncation (Finding no.4), and lifecycle activation race conditions on ARM64 (Finding no.7).

Planning documents (plans/m20-rosnav-migration/)

Directory Contents
00-discovery/ Initial DDS topic discovery and QoS analysis from live M20 hardware
01-scope/ Requirements gathering -- multi-LLM question generation (Opus, GPT, Gemini), question triage, and system context analysis
02-spec/ Technical specification with key decisions table, phased delivery plan, and spec review assessment
03-plan/ Implementation plan with task breakdown, dependency graph, and plan review
04-beads/ Issue tracking drafts (beads format) with review passes
05-drdds-bridge/ Detailed plan for the drdds SHM bridge (lidar + IMU data path)

Reference documentation

Document Description
dimos/robot/deeprobotics/m20/docs/m20-official-software-development-guide.md Official Deep Robotics M20 software development guide -- covers the 3-board architecture (AOS/GOS/NOS), drdds middleware, UDP protocol, DDS topics, network topology, and sensor specifications.

Hardware details

  • M20 3-board architecture: AOS (10.21.31.103, main controller), GOS (10.21.31.104, 5G comms), NOS (10.21.31.106, Docker host + compute)
  • Sensors: 2x RoboSense RSAIRY 192-channel lidars (merged via drdds), Yesense IMU, RTSP camera
  • Verified on: M20-770 ("gogo") via Tailscale

Key files

Path Purpose
dimos/robot/deeprobotics/m20/connection.py Main M20 connection module
dimos/robot/deeprobotics/m20/blueprints/rosnav/m20_smartnav.py SmartNav blueprint
dimos/robot/deeprobotics/m20/rosnav_docker.py Docker container management
dimos/robot/deeprobotics/m20/velocity_controller_dds.py Native drdds velocity control
dimos/robot/deeprobotics/m20/docker/deploy.sh Deployment tooling
dimos/robot/deeprobotics/protocol.py Deep Robotics UDP protocol
dimos/robot/deeprobotics/m20/docker/drdds_bridge/ SHM bridge source (drdds_recv + ros2_pub)

Test plan

  • ARISE SLAM producing /state_estimation and /registered_scan on M20 hardware
  • Costmap visible in command center (port 7779)
  • Color image streaming in rerun viewer (~7fps)
  • nav_cmd_pub publishing /NAV_CMD via native drdds
  • CmdVelMux routing teleop + nav cmd_vel
  • End-to-end WASD teleop via dimos-viewer
  • Click-to-goal autonomous navigation
  • Multi-floor mapping with PGO loop closure

🤖 Generated with Claude Code

jeff-hykin and others added 30 commits March 30, 2026 14:28
# Conflicts:
#	dimos/manipulation/blueprints.py
#	dimos/robot/all_blueprints.py
…tion

Resolve merge conflicts keeping dimos3-specific WebsocketVisModule while
adopting vis_module branch improvements: deferred LCM instantiation,
collapsed rerun-connect case, per-client teleop tracking, configurable
start_timeout, connect-mode gRPC serving, and lru_cache cleanup.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…tion

Resolve merge conflicts keeping dimos3-specific WebsocketVisModule while
adopting vis_module branch improvements: deferred LCM instantiation,
collapsed rerun-connect case, per-client teleop tracking, configurable
start_timeout, connect-mode gRPC serving, and lru_cache cleanup.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
aphexcx added 30 commits April 10, 2026 17:54
Add DrddsLidarBridge — a C++ NativeModule that reads lidar + IMU from
POSIX SHM (written by drdds_recv) and publishes to LCM with full
ring/time field preservation for ARISE SLAM. Replaces ros2_pub.cpp
(ROS2 output) with direct LCM output, eliminating the Docker container.

New m20_smartnav_native blueprint runs the entire nav stack on the host:
  DrddsLidarBridge → AriseSLAM → smart_nav() → M20Connection

Includes nix build infrastructure (CMakeLists.txt + flake.nix) and
migration plan with Codex-reviewed critical issues.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Executed-By: dimos/crew/ace
…city controller

M20Connection now switches to NAVIGATION mode + agile gait when using
the DDS velocity controller (native nav without rclpy), not just when
rclpy is available. Also skips dead-reckoning odometry startup since
AriseSLAM provides odometry in that configuration.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Executed-By: dimos/crew/ace
Dead-reckoning is no longer needed — AriseSLAM provides odometry.
Mac bridge (TCP to GOS) is replaced by native DrddsLidarBridge.

Simplifies M20Connection to: camera (RTSP), velocity control
(/NAV_CMD via drdds or UDP), and robot state management.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Executed-By: dimos/crew/ace
Nix's coreutils (glibc 2.42) uses fchmodat2 syscall which kernel 5.10
(Rockchip RK3588) doesn't have. Override unpackPhase and fixupPhase
to use host coreutils (/usr/bin/cp, /usr/bin/chmod) instead.

Also requires --option sandbox false and /nix bind-mounted to ext4
(NOS root is overlayfs which has additional cp permission issues).

Build with: nix build --extra-experimental-features 'nix-command flakes' --option sandbox false

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Executed-By: dimos/crew/ace
… cleanup

- Fix DrddsLidarBridge cwd from absolute to relative "cpp"
- Set build_command=None for all NativeModules (pre-built via nix)
- Fix M20Connection IP (use literal instead of global_config at load time)
- Remove dead-reckoning odometry entirely
- Remove mac bridge client
- Add arise-build-wrapper.nix for kernel 5.10 workaround
- Update DrddsLidarBridge flake with unpackPhase/fixupPhase host coreutils fix

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Executed-By: dimos/crew/ace
…ocker

Full native pipeline verified on NOS:
- DrddsLidarBridge → AriseSLAM → TerrainAnalysis → LocalPlanner →
  PathFollower → PGO → CmdVelMux → M20Connection
- All NativeModules built via nix on ARM64 (kernel 5.10 workaround)
- Point clouds + color images streaming in dimos-viewer
- WASD teleop reaches CmdVelMux

Blocker: /NAV_CMD velocity commands can't reach AOS motor controller
from NOS — drdds SHM is local-only, no cross-board transport configured.
Investigating drdds UDP transport configuration.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Executed-By: dimos/crew/ace
…blocker

Finding no.2: drdds SDK hardcodes SHM-only transport. Cross-board
/NAV_CMD (NOS→AOS) is impossible via drdds config alone. Investigated
drqos.xml modifications, FASTRTPS_DEFAULT_PROFILES_FILE env var, and
enable_config flag — none enable UDP transport.

Recommended fix: TCP/UDP bridge on AOS that receives velocity commands
from NOS and publishes /NAV_CMD locally via drdds.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Executed-By: dimos/crew/ace
…'t match

Findings:
- DrDDSPublisher matched_count=0 even locally on AOS (same machine)
- enable_discovery=true didn't help
- basic_server uses NavCmdPubSubType but topic name may differ
- The only proven /NAV_CMD path was mac_bridge via rclpy on GOS
- DrDDSPublisher may create incompatible DDS endpoints vs ROS2

Next: lightweight ROS2 velocity bridge on GOS (same pattern as mac_bridge)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Executed-By: dimos/crew/ace
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Executed-By: dimos/crew/ace
DrDDSPublisher/Channel creates DDS endpoints incompatible with what
basic_server expects. Only proven working /NAV_CMD path was via rclpy
on GOS (mac_bridge). Need to investigate ROS2 vs raw drdds publisher
compatibility.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Executed-By: dimos/crew/ace
Replace drdds nav_cmd_pub with TCP socket to nav_cmd_rclpy_bridge on
AOS. The rclpy bridge publishes /NAV_CMD via ROS2 which is the only
proven path that basic_server accepts.

Chain: M20Connection → TCP:9740 → AOS rclpy bridge → /NAV_CMD
Bridge connection verified (bridge=True in logs).

Remaining issue: CmdVelMux receives teleop ("Teleop active") but
doesn't output non-zero cmd_vel — needs investigation.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Executed-By: dimos/crew/ace
Three bugs found by Codex preventing WASD teleop from reaching robot:

1. velocity_controller_dds.py: no reconnect on initial TCP connect
   failure — controller stayed dead until full restart. Added periodic
   reconnect attempts in _publish_once().

2. nav_cmd_rclpy_bridge.py: bridge wedged after first client disconnect,
   looping on dead socket instead of returning to accept(). Fixed recv
   logic with _recv_packet() helper.

3. transport.py: LCMTransport.broadcast() started unnecessary receive
   loop on publish-only transports, adding socket/listener load.

Tests added for all three fixes (4 passed).

Co-Authored-By: Codex (GPT-5.4) <noreply@openai.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Executed-By: dimos/crew/ace
The breakthrough: adding a host route on NOS (10.21.33.103 via 10.21.31.103)
lets rclpy reach AOS's DDS discovery on the 10.21.33.x subnet where
basic_server announces. No GOS bridge, no TCP hop, no firmware changes.

NOS rclpy publisher matches AOS basic_server in <1 second.
WASD in dimos-viewer moves the robot.

Full native architecture (no Docker, no bridge):
  NOS: DrddsLidarBridge → AriseSLAM → SmartNav → M20Connection
       → rclpy /NAV_CMD (FASTRTPS_DEFAULT_PROFILES_FILE targeting
         10.21.33.103) → AOS basic_server → motors

Prerequisites on NOS after reboot:
  sudo ip route add 10.21.33.103/32 via 10.21.31.103
  sudo ip link set lo multicast on
  sudo ip route add 224.0.0.0/4 dev lo
  sudo mount --bind /var/opt/robot/data/nix /nix

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Executed-By: dimos/crew/ace
Deleted (no longer needed with native nav stack):
- Docker: Dockerfile.nav, m20_entrypoint.sh, launch_nos.py
- Bridges: nav_cmd_rclpy_bridge.py, nav_cmd_bridge.cpp,
  nav_cmd_publisher.cpp, ros2_pub.cpp, build_nav_cmd_pub.sh
- Mac bridge: mac_bridge.py, mac_bridge_client.py
- Dead-reckoning: odometry.py
- Old blueprints: m20_rosnav.py, m20_smartnav.py (replaced by
  m20_smartnav_native.py)
- rosnav_docker.py (Docker container management)
- velocity_controller.py (UDP, replaced by DDS/TCP controller)
- arise-build-wrapper.nix (one-off build helper)
- Tests for removed code

Kept:
- drdds_recv.cpp (SHM bridge for lidar/IMU)
- DrddsLidarBridge NativeModule
- m20_smartnav_native.py (the working blueprint)
- velocity_controller_dds.py (now publishes via rclpy on NOS)
- connection.py (camera + protocol)
- Config files (arise_slam_m20.yaml, fastdds_m20.xml, etc.)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Executed-By: dimos/crew/ace
NavCmdPub publishes to rt/NAV_CMD via raw FastDDS 2.14 API, matching
AOS basic_server's ROS2 subscriber. No rclpy, no ROS, no TCP bridge.

The key insight: ROS2 rmw_fastrtps prefixes DDS topics with 'rt/',
so basic_server subscribes on 'rt/NAV_CMD' not '/NAV_CMD'. Our raw
FastDDS publisher uses the same convention + type name to match.

WASD teleop verified working end-to-end. Full native stack:
  DrddsLidarBridge → AriseSLAM → SmartNav → CmdVelMux
  → NavCmdPub (FastDDS rt/NAV_CMD) → AOS basic_server → motors

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Executed-By: dimos/crew/ace
No longer needed — NavCmdPub NativeModule publishes directly via
raw FastDDS. No TCP, no rclpy, no Python bridge.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Executed-By: dimos/crew/ace
Complete investigation log with 5 findings documenting the journey
from Docker+ROS to fully native: nix kernel workaround, drdds SHM
limitations, rt/ topic prefix discovery, and raw FastDDS solution.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Executed-By: dimos/crew/ace
Remove all dead code from M20Connection:
- Velocity controller (now handled by NavCmdPub NativeModule)
- cmd_vel In port (NavCmdPub subscribes directly)
- ROS sensors path (rclpy/M20ROSSensors)
- CycloneDDS lidar fallback
- UDP velocity controller
- enable_ros, enable_lidar params

M20Connection is now purely: camera (RTSP) + robot state (UDP protocol
for stand/sit/gait/mode). Clean and focused.

Also removed: ros_sensors.py, lidar.py, skill_container.py

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Executed-By: dimos/crew/ace
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Executed-By: dimos/crew/ace
- Rename blueprints/rosnav/ → blueprints/nav/ (no ROSNav anymore)
- Delete broken old blueprints (m20_smart, m20_minimal, m20_agentic)
- Fix blueprint: remove deleted params (enable_ros, enable_lidar)
- Fix connection.py: remove unused Twist import
- Rewrite deploy.sh: 700→100 lines, just setup + bridge commands
- Delete Docker-era artifacts: launch files, drdds_msgs, fastdds.xml
- Delete ros_sensors.py, lidar.py, skill_container.py

Final M20 file count: 21 files (down from 59)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Executed-By: dimos/crew/ace
- Rename docker/ → config/ (nothing runs in Docker)
- Move deploy.sh to M20 root (not a Docker tool)
- Move drdds_recv.cpp + shm_transport.h into drdds_bridge/cpp/
  (consolidated with DrddsLidarBridge and NavCmdPub)
- Delete empty docker/ directory

19 files total. Clean layout:
  blueprints/nav/  — working blueprint
  config/          — ARISE + planner YAML
  drdds_bridge/    — all C++ NativeModules
  docs/            — M20 dev guide

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Executed-By: dimos/crew/ace
__init__.py was importing deleted modules (skill_container,
velocity_controller, ros_sensors, mac_bridge_client). Simplified
to just M20Connection + M20RTSPCamera.

deploy.sh: use remote_sudo() helper with NOPASSWD support.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Executed-By: dimos/crew/ace
Added:
  sync    - rsync dimos source + fix nix binary symlinks
  start   - launch smartnav on NOS
  stop    - kill smartnav
  restart - stop + start
  viewer  - SSH tunnels + dimos-viewer in one command

Quick deploy: ./deploy.sh sync --host gogo && ./deploy.sh restart --host gogo
After reboot: ./deploy.sh setup && bridge-start && start && viewer

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Executed-By: dimos/crew/ace
…ourney summary

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Executed-By: dimos/crew/ace
H2: deploy.sh restart now handles missing --host flag correctly
H3: removed dead _publish_tf/_odom_to_tf and undeclared self.tf
M4: added comment documenting 10Hz keepalive as intentional
M5: deploy.sh stop sends SIGTERM first, waits 5s, then SIGKILL
M7: removed _camera_info type annotation (only set when camera enabled)
L2: removed unused g_vel_seq atomic counter from nav_cmd_pub.cpp
L5: updated drdds_recv.cpp and shm_transport.h comments (ros2_pub→DrddsLidarBridge)

Also: removed unused imports (PoseStamped, Quaternion, Transform, Vector3)
and lidar_height param from M20Connection (TF handled elsewhere).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Executed-By: dimos/crew/ace
- airy_imu_bridge: UDP multicast reader for RSAIRY's built-in IMU
  - Parses 51-byte packet per rs_driver decoder_RSAIRY.hpp
  - FSR-aware unit conversion (g, dps → m/s², rad/s)
  - Empirically-derived rotation into base_link (front/rear)
  - PTP-lock sanity gate drops pre-2024 timestamps
  - 200.0-200.4 Hz verified live
- fastlio2 wrapper: --native_clock CLI flag
  - Bypasses wall-clock-anchor workarounds from Findings dimensionalOS#7-8
  - Enforces monotonic floor on frame_ts (Finding dimensionalOS#9 revisited)
  - Exposed via FastLio2Config.native_clock
- velodyne.yaml: identity extrinsic + extrinsic_est_en=true (valid only
  when IMU comes from airy_imu_bridge)
- M20 blueprint: M20_FASTLIO2_IMU env var selects yesense|airy, wires
  AiryImuBridge conditionally
- drdds_bridge flake: dontUnpack + dontFixup (kernel 5.10 fchmodat2)
- CMakeLists: BUILD_NAV_CMD_PUB option (skipped in nix; built locally)
- Vendor dimos_native_module.hpp into drdds_bridge/cpp/include/ so nix
  build is self-contained
- deploy.sh: symlink airy_imu_bridge alongside drdds_lidar_bridge

Status (FASTLIO2_LOG Finding dimensionalOS#18):
- First ~50s of stationary drift: within 2m of origin (huge improvement
  from yesense path's 28m/15s baseline)
- After ~50s: drift explodes. Diagnosed as fastlio2's LCM IMU subscriber
  stalling 2-4s while lidar keeps arriving. Codex-predicted single-
  threaded handler + slow feed_lidar_pc2 on 100k-160k point clouds.
- Y-axis base-frame accel bias (~0.3 m/s²) unresolved. Earlier attempted
  fix subtracted on wrong sensor axis; reverted.

Next: split LCM subscriber into lidar+IMU threads, or drop-old IMU queue,
or try time_sync_en=true in velodyne.yaml. See FASTLIO2_LOG Finding dimensionalOS#18.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Executed-By: dimos/crew/ace
…nary)

Stacked fixes that together unblock FAST-LIO2 on M20 with Airy integrated IMU:

1. LCM transport split (codex review 2026-04-21): fastlio2 callbacks now only
   copy raw bytes into bounded std::deque queues (lidar cap 3, IMU cap 400).
   New fastlio_owner_loop thread drains queues with IMU priority (all pending
   IMU drained before each lidar frame) and calls feed_imu / feed_lidar_pc2
   serially. Previously a slow lidar callback (100k-pt conversion) blocked
   ~20 IMU messages in LCM's socket buffer, stalling imu_latest 2-4s, which
   caused the 50s-then-explode drift pattern.

2. drdds_recv: 5-channel subscriber (lidar/lidar2/IMU/IMU201/IMU202) writing
   to per-channel SHM segments. Enables rsdriver's send_separately:true mode
   to deliver both Airy lidars as independent streams in drdds_bridge_lidar
   and drdds_bridge_lidar2. CMake BUILD_NAV_CMD_PUB option gates the
   host-libs path that also covers this binary.

3. airy_imu_bridge: --frame {base_link|sensor} flag + persistent base-frame
   accel bias subtraction (-0.13/-0.35/-0.22 m/s² on x/y/z for front Airy).
   Earlier attempt subtracted in sensor frame on the wrong axis (sensor Y
   maps to base Z under R_base_from_front, not base Y) — reverted.

Results: 5+ min stationary test = 1 keyframe at origin (vs hundreds of meters
of drift in all prior runs). Zero queue drops. imu_vs_frame_end within ±0.25s
sustainably. Production-grade single-lidar single-IMU baseline.

Next: extend drdds_bridge LCM for 2nd lidar; port FAST_LIO_MULTI async mode
onto our leshy fork base for dual-lidar operation. Per codex review, do those
on top of this clean baseline rather than before.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Executed-By: dimos/crew/ace
Replaces the hardcoded (0.13, 0.35, 0.22) base-frame bias subtraction with
the RSAIRY factory IMU-to-lidar quaternion read from DIFOP register C.17
(7×uint32 BE at packet offset 1092, bit-cast to float32). Per-unit, per-
robot correct — no more bandaid that would fail under tilt/temperature.

Implementation:
- Parse DIFOP at 224.10.10.201:7781 (front) / 202:7782 (rear), validate
  magic header (0xA5FF005A11115555) and tail (0x0FF0), defensive quaternion
  renormalization, reject when |norm-1| > 0.01.
- Compose R_imu_to_base = R_base_lidar · R_imu_to_lidar; hot-swap via
  atomic<Mat3*> on successful DIFOP receipt.
- Background retry thread polls DIFOP every ~2s until success.
- Degraded startup: IMU path starts immediately with R_imu_to_lidar = I
  (no blocking on DIFOP); status line carries cal=PENDING|OK|N/A flag.
- Shutdown propagation: set g_running=false on recv error so join()
  doesn't hang.

Also includes Findings dimensionalOS#19/dimensionalOS#20/dimensionalOS#21 in FASTLIO2_LOG.md covering the
rsdriver separate mode + discovery ordering work, LCM transport split,
and the DIFOP rollout plan (codex-reviewed, both rounds).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Executed-By: dimos/crew/ace
…lOS#22/dimensionalOS#23

Three intertwined fixes from the DIFOP rollout test.

1. Mount matrix R_BASE_LIDAR_{FRONT,REAR} corrected. The old matrix
   collapsed the IMU-chip-to-lidar-housing rotation into the mount
   matrix, which worked pre-DIFOP (as R_base_from_IMU) but
   double-applied the rotation once DIFOP was layered on top — gravity
   landed on base -Y instead of +Z. User confirmed physical mount:
   horizontal, dome forward, cable exits upward through the top. That
   pins: lidar Z (dome) -> base +X (forward); lidar X (cable) -> base
   +Z (up); lidar Y -> base -Y (right, right-hand rule). Codex
   verified the cross-product, the Rz(180 deg) identity for rear, and
   the R_base_lidar * R_imu_to_lidar composition order. Live
   stationary now reads (~0.1, ~-0.07, +9.85) — clean +Z gravity,
   residuals within IMU noise floor, no bandaid bias subtraction.

2. M20_SKIP_STAND env flag on M20Connection. Set to 1 to skip the
   MotionState.STAND send (robot stays on the charging dock) while
   still sending UsageMode.NAVIGATION + GaitType.AGILE_FLAT — those
   are required, other usage modes can power off the lidars per M20
   dev guide. stop() skips the mirror-image SIT too.

3. FASTLIO2_LOG Finding dimensionalOS#22 (mount matrix derivation) and Finding
   dimensionalOS#23 (two drdds_recv processes root cause + deploy.sh symlink loop
   bug — the old drdds-bridge.service was shadowing the deploy-
   provisioned drdds-recv.service; load avg 38, imu lag -0.7s,
   catastrophic drift. Disabling the old service dropped load to 1.2
   and imu_vs_frame_end to -0.05s).

5-min stationary: 1 keyframe at origin, cal=OK throughout,
imu_vs_frame_end bounded at ~70ms.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Executed-By: dimos/crew/ace
1. deploy.sh sync now picks the newest nix store output by mtime (via
   ls -dt … | head -1) instead of whichever hash sorted last
   alphabetically in the for-loop. That silent revert bit us twice
   today — rebuilt binary, synced, and the symlink landed on a stale
   pre-DIFOP store path.

2. deploy.sh provision now explicitly stops + disables the legacy
   drdds-bridge.service if present. The old rig provisioning
   installed it running the same drdds_recv binary that our own
   drdds-recv.service uses; leaving both enabled spawns two
   drdds_recv processes, saturates NOS (load avg 38), stalls IMU
   delivery, and causes catastrophic SLAM drift. See Finding dimensionalOS#23.

3. FASTLIO2_LOG Finding dimensionalOS#21 marked superseded by dimensionalOS#22 with a one-line
   pointer to the real resolution (the empirical mount matrix was
   R_base_from_IMU, not R_base_from_lidar).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Executed-By: dimos/crew/ace
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants