feat: Deep Robotics M20 autonomous navigation support#1768
Draft
aphexcx wants to merge 1609 commits intodimensionalOS:devfrom
Draft
feat: Deep Robotics M20 autonomous navigation support#1768aphexcx wants to merge 1609 commits intodimensionalOS:devfrom
aphexcx wants to merge 1609 commits intodimensionalOS:devfrom
Conversation
# Conflicts: # dimos/manipulation/blueprints.py # dimos/robot/all_blueprints.py
…tion Resolve merge conflicts keeping dimos3-specific WebsocketVisModule while adopting vis_module branch improvements: deferred LCM instantiation, collapsed rerun-connect case, per-client teleop tracking, configurable start_timeout, connect-mode gRPC serving, and lru_cache cleanup. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…tion Resolve merge conflicts keeping dimos3-specific WebsocketVisModule while adopting vis_module branch improvements: deferred LCM instantiation, collapsed rerun-connect case, per-client teleop tracking, configurable start_timeout, connect-mode gRPC serving, and lru_cache cleanup. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add DrddsLidarBridge — a C++ NativeModule that reads lidar + IMU from POSIX SHM (written by drdds_recv) and publishes to LCM with full ring/time field preservation for ARISE SLAM. Replaces ros2_pub.cpp (ROS2 output) with direct LCM output, eliminating the Docker container. New m20_smartnav_native blueprint runs the entire nav stack on the host: DrddsLidarBridge → AriseSLAM → smart_nav() → M20Connection Includes nix build infrastructure (CMakeLists.txt + flake.nix) and migration plan with Codex-reviewed critical issues. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Executed-By: dimos/crew/ace
…city controller M20Connection now switches to NAVIGATION mode + agile gait when using the DDS velocity controller (native nav without rclpy), not just when rclpy is available. Also skips dead-reckoning odometry startup since AriseSLAM provides odometry in that configuration. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Executed-By: dimos/crew/ace
Dead-reckoning is no longer needed — AriseSLAM provides odometry. Mac bridge (TCP to GOS) is replaced by native DrddsLidarBridge. Simplifies M20Connection to: camera (RTSP), velocity control (/NAV_CMD via drdds or UDP), and robot state management. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Executed-By: dimos/crew/ace
Nix's coreutils (glibc 2.42) uses fchmodat2 syscall which kernel 5.10 (Rockchip RK3588) doesn't have. Override unpackPhase and fixupPhase to use host coreutils (/usr/bin/cp, /usr/bin/chmod) instead. Also requires --option sandbox false and /nix bind-mounted to ext4 (NOS root is overlayfs which has additional cp permission issues). Build with: nix build --extra-experimental-features 'nix-command flakes' --option sandbox false Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Executed-By: dimos/crew/ace
… cleanup - Fix DrddsLidarBridge cwd from absolute to relative "cpp" - Set build_command=None for all NativeModules (pre-built via nix) - Fix M20Connection IP (use literal instead of global_config at load time) - Remove dead-reckoning odometry entirely - Remove mac bridge client - Add arise-build-wrapper.nix for kernel 5.10 workaround - Update DrddsLidarBridge flake with unpackPhase/fixupPhase host coreutils fix Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Executed-By: dimos/crew/ace
…ocker Full native pipeline verified on NOS: - DrddsLidarBridge → AriseSLAM → TerrainAnalysis → LocalPlanner → PathFollower → PGO → CmdVelMux → M20Connection - All NativeModules built via nix on ARM64 (kernel 5.10 workaround) - Point clouds + color images streaming in dimos-viewer - WASD teleop reaches CmdVelMux Blocker: /NAV_CMD velocity commands can't reach AOS motor controller from NOS — drdds SHM is local-only, no cross-board transport configured. Investigating drdds UDP transport configuration. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Executed-By: dimos/crew/ace
…blocker Finding no.2: drdds SDK hardcodes SHM-only transport. Cross-board /NAV_CMD (NOS→AOS) is impossible via drdds config alone. Investigated drqos.xml modifications, FASTRTPS_DEFAULT_PROFILES_FILE env var, and enable_config flag — none enable UDP transport. Recommended fix: TCP/UDP bridge on AOS that receives velocity commands from NOS and publishes /NAV_CMD locally via drdds. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Executed-By: dimos/crew/ace
…'t match Findings: - DrDDSPublisher matched_count=0 even locally on AOS (same machine) - enable_discovery=true didn't help - basic_server uses NavCmdPubSubType but topic name may differ - The only proven /NAV_CMD path was mac_bridge via rclpy on GOS - DrDDSPublisher may create incompatible DDS endpoints vs ROS2 Next: lightweight ROS2 velocity bridge on GOS (same pattern as mac_bridge) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Executed-By: dimos/crew/ace
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Executed-By: dimos/crew/ace
DrDDSPublisher/Channel creates DDS endpoints incompatible with what basic_server expects. Only proven working /NAV_CMD path was via rclpy on GOS (mac_bridge). Need to investigate ROS2 vs raw drdds publisher compatibility. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Executed-By: dimos/crew/ace
Replace drdds nav_cmd_pub with TCP socket to nav_cmd_rclpy_bridge on
AOS. The rclpy bridge publishes /NAV_CMD via ROS2 which is the only
proven path that basic_server accepts.
Chain: M20Connection → TCP:9740 → AOS rclpy bridge → /NAV_CMD
Bridge connection verified (bridge=True in logs).
Remaining issue: CmdVelMux receives teleop ("Teleop active") but
doesn't output non-zero cmd_vel — needs investigation.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Executed-By: dimos/crew/ace
Three bugs found by Codex preventing WASD teleop from reaching robot: 1. velocity_controller_dds.py: no reconnect on initial TCP connect failure — controller stayed dead until full restart. Added periodic reconnect attempts in _publish_once(). 2. nav_cmd_rclpy_bridge.py: bridge wedged after first client disconnect, looping on dead socket instead of returning to accept(). Fixed recv logic with _recv_packet() helper. 3. transport.py: LCMTransport.broadcast() started unnecessary receive loop on publish-only transports, adding socket/listener load. Tests added for all three fixes (4 passed). Co-Authored-By: Codex (GPT-5.4) <noreply@openai.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Executed-By: dimos/crew/ace
The breakthrough: adding a host route on NOS (10.21.33.103 via 10.21.31.103)
lets rclpy reach AOS's DDS discovery on the 10.21.33.x subnet where
basic_server announces. No GOS bridge, no TCP hop, no firmware changes.
NOS rclpy publisher matches AOS basic_server in <1 second.
WASD in dimos-viewer moves the robot.
Full native architecture (no Docker, no bridge):
NOS: DrddsLidarBridge → AriseSLAM → SmartNav → M20Connection
→ rclpy /NAV_CMD (FASTRTPS_DEFAULT_PROFILES_FILE targeting
10.21.33.103) → AOS basic_server → motors
Prerequisites on NOS after reboot:
sudo ip route add 10.21.33.103/32 via 10.21.31.103
sudo ip link set lo multicast on
sudo ip route add 224.0.0.0/4 dev lo
sudo mount --bind /var/opt/robot/data/nix /nix
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Executed-By: dimos/crew/ace
Deleted (no longer needed with native nav stack): - Docker: Dockerfile.nav, m20_entrypoint.sh, launch_nos.py - Bridges: nav_cmd_rclpy_bridge.py, nav_cmd_bridge.cpp, nav_cmd_publisher.cpp, ros2_pub.cpp, build_nav_cmd_pub.sh - Mac bridge: mac_bridge.py, mac_bridge_client.py - Dead-reckoning: odometry.py - Old blueprints: m20_rosnav.py, m20_smartnav.py (replaced by m20_smartnav_native.py) - rosnav_docker.py (Docker container management) - velocity_controller.py (UDP, replaced by DDS/TCP controller) - arise-build-wrapper.nix (one-off build helper) - Tests for removed code Kept: - drdds_recv.cpp (SHM bridge for lidar/IMU) - DrddsLidarBridge NativeModule - m20_smartnav_native.py (the working blueprint) - velocity_controller_dds.py (now publishes via rclpy on NOS) - connection.py (camera + protocol) - Config files (arise_slam_m20.yaml, fastdds_m20.xml, etc.) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Executed-By: dimos/crew/ace
NavCmdPub publishes to rt/NAV_CMD via raw FastDDS 2.14 API, matching AOS basic_server's ROS2 subscriber. No rclpy, no ROS, no TCP bridge. The key insight: ROS2 rmw_fastrtps prefixes DDS topics with 'rt/', so basic_server subscribes on 'rt/NAV_CMD' not '/NAV_CMD'. Our raw FastDDS publisher uses the same convention + type name to match. WASD teleop verified working end-to-end. Full native stack: DrddsLidarBridge → AriseSLAM → SmartNav → CmdVelMux → NavCmdPub (FastDDS rt/NAV_CMD) → AOS basic_server → motors Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Executed-By: dimos/crew/ace
No longer needed — NavCmdPub NativeModule publishes directly via raw FastDDS. No TCP, no rclpy, no Python bridge. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Executed-By: dimos/crew/ace
Complete investigation log with 5 findings documenting the journey from Docker+ROS to fully native: nix kernel workaround, drdds SHM limitations, rt/ topic prefix discovery, and raw FastDDS solution. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Executed-By: dimos/crew/ace
Remove all dead code from M20Connection: - Velocity controller (now handled by NavCmdPub NativeModule) - cmd_vel In port (NavCmdPub subscribes directly) - ROS sensors path (rclpy/M20ROSSensors) - CycloneDDS lidar fallback - UDP velocity controller - enable_ros, enable_lidar params M20Connection is now purely: camera (RTSP) + robot state (UDP protocol for stand/sit/gait/mode). Clean and focused. Also removed: ros_sensors.py, lidar.py, skill_container.py Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Executed-By: dimos/crew/ace
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Executed-By: dimos/crew/ace
- Rename blueprints/rosnav/ → blueprints/nav/ (no ROSNav anymore) - Delete broken old blueprints (m20_smart, m20_minimal, m20_agentic) - Fix blueprint: remove deleted params (enable_ros, enable_lidar) - Fix connection.py: remove unused Twist import - Rewrite deploy.sh: 700→100 lines, just setup + bridge commands - Delete Docker-era artifacts: launch files, drdds_msgs, fastdds.xml - Delete ros_sensors.py, lidar.py, skill_container.py Final M20 file count: 21 files (down from 59) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Executed-By: dimos/crew/ace
- Rename docker/ → config/ (nothing runs in Docker) - Move deploy.sh to M20 root (not a Docker tool) - Move drdds_recv.cpp + shm_transport.h into drdds_bridge/cpp/ (consolidated with DrddsLidarBridge and NavCmdPub) - Delete empty docker/ directory 19 files total. Clean layout: blueprints/nav/ — working blueprint config/ — ARISE + planner YAML drdds_bridge/ — all C++ NativeModules docs/ — M20 dev guide Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Executed-By: dimos/crew/ace
__init__.py was importing deleted modules (skill_container, velocity_controller, ros_sensors, mac_bridge_client). Simplified to just M20Connection + M20RTSPCamera. deploy.sh: use remote_sudo() helper with NOPASSWD support. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Executed-By: dimos/crew/ace
Added: sync - rsync dimos source + fix nix binary symlinks start - launch smartnav on NOS stop - kill smartnav restart - stop + start viewer - SSH tunnels + dimos-viewer in one command Quick deploy: ./deploy.sh sync --host gogo && ./deploy.sh restart --host gogo After reboot: ./deploy.sh setup && bridge-start && start && viewer Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Executed-By: dimos/crew/ace
…ourney summary Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Executed-By: dimos/crew/ace
H2: deploy.sh restart now handles missing --host flag correctly H3: removed dead _publish_tf/_odom_to_tf and undeclared self.tf M4: added comment documenting 10Hz keepalive as intentional M5: deploy.sh stop sends SIGTERM first, waits 5s, then SIGKILL M7: removed _camera_info type annotation (only set when camera enabled) L2: removed unused g_vel_seq atomic counter from nav_cmd_pub.cpp L5: updated drdds_recv.cpp and shm_transport.h comments (ros2_pub→DrddsLidarBridge) Also: removed unused imports (PoseStamped, Quaternion, Transform, Vector3) and lidar_height param from M20Connection (TF handled elsewhere). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Executed-By: dimos/crew/ace
- airy_imu_bridge: UDP multicast reader for RSAIRY's built-in IMU - Parses 51-byte packet per rs_driver decoder_RSAIRY.hpp - FSR-aware unit conversion (g, dps → m/s², rad/s) - Empirically-derived rotation into base_link (front/rear) - PTP-lock sanity gate drops pre-2024 timestamps - 200.0-200.4 Hz verified live - fastlio2 wrapper: --native_clock CLI flag - Bypasses wall-clock-anchor workarounds from Findings dimensionalOS#7-8 - Enforces monotonic floor on frame_ts (Finding dimensionalOS#9 revisited) - Exposed via FastLio2Config.native_clock - velodyne.yaml: identity extrinsic + extrinsic_est_en=true (valid only when IMU comes from airy_imu_bridge) - M20 blueprint: M20_FASTLIO2_IMU env var selects yesense|airy, wires AiryImuBridge conditionally - drdds_bridge flake: dontUnpack + dontFixup (kernel 5.10 fchmodat2) - CMakeLists: BUILD_NAV_CMD_PUB option (skipped in nix; built locally) - Vendor dimos_native_module.hpp into drdds_bridge/cpp/include/ so nix build is self-contained - deploy.sh: symlink airy_imu_bridge alongside drdds_lidar_bridge Status (FASTLIO2_LOG Finding dimensionalOS#18): - First ~50s of stationary drift: within 2m of origin (huge improvement from yesense path's 28m/15s baseline) - After ~50s: drift explodes. Diagnosed as fastlio2's LCM IMU subscriber stalling 2-4s while lidar keeps arriving. Codex-predicted single- threaded handler + slow feed_lidar_pc2 on 100k-160k point clouds. - Y-axis base-frame accel bias (~0.3 m/s²) unresolved. Earlier attempted fix subtracted on wrong sensor axis; reverted. Next: split LCM subscriber into lidar+IMU threads, or drop-old IMU queue, or try time_sync_en=true in velodyne.yaml. See FASTLIO2_LOG Finding dimensionalOS#18. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Executed-By: dimos/crew/ace
…nary)
Stacked fixes that together unblock FAST-LIO2 on M20 with Airy integrated IMU:
1. LCM transport split (codex review 2026-04-21): fastlio2 callbacks now only
copy raw bytes into bounded std::deque queues (lidar cap 3, IMU cap 400).
New fastlio_owner_loop thread drains queues with IMU priority (all pending
IMU drained before each lidar frame) and calls feed_imu / feed_lidar_pc2
serially. Previously a slow lidar callback (100k-pt conversion) blocked
~20 IMU messages in LCM's socket buffer, stalling imu_latest 2-4s, which
caused the 50s-then-explode drift pattern.
2. drdds_recv: 5-channel subscriber (lidar/lidar2/IMU/IMU201/IMU202) writing
to per-channel SHM segments. Enables rsdriver's send_separately:true mode
to deliver both Airy lidars as independent streams in drdds_bridge_lidar
and drdds_bridge_lidar2. CMake BUILD_NAV_CMD_PUB option gates the
host-libs path that also covers this binary.
3. airy_imu_bridge: --frame {base_link|sensor} flag + persistent base-frame
accel bias subtraction (-0.13/-0.35/-0.22 m/s² on x/y/z for front Airy).
Earlier attempt subtracted in sensor frame on the wrong axis (sensor Y
maps to base Z under R_base_from_front, not base Y) — reverted.
Results: 5+ min stationary test = 1 keyframe at origin (vs hundreds of meters
of drift in all prior runs). Zero queue drops. imu_vs_frame_end within ±0.25s
sustainably. Production-grade single-lidar single-IMU baseline.
Next: extend drdds_bridge LCM for 2nd lidar; port FAST_LIO_MULTI async mode
onto our leshy fork base for dual-lidar operation. Per codex review, do those
on top of this clean baseline rather than before.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Executed-By: dimos/crew/ace
Replaces the hardcoded (0.13, 0.35, 0.22) base-frame bias subtraction with the RSAIRY factory IMU-to-lidar quaternion read from DIFOP register C.17 (7×uint32 BE at packet offset 1092, bit-cast to float32). Per-unit, per- robot correct — no more bandaid that would fail under tilt/temperature. Implementation: - Parse DIFOP at 224.10.10.201:7781 (front) / 202:7782 (rear), validate magic header (0xA5FF005A11115555) and tail (0x0FF0), defensive quaternion renormalization, reject when |norm-1| > 0.01. - Compose R_imu_to_base = R_base_lidar · R_imu_to_lidar; hot-swap via atomic<Mat3*> on successful DIFOP receipt. - Background retry thread polls DIFOP every ~2s until success. - Degraded startup: IMU path starts immediately with R_imu_to_lidar = I (no blocking on DIFOP); status line carries cal=PENDING|OK|N/A flag. - Shutdown propagation: set g_running=false on recv error so join() doesn't hang. Also includes Findings dimensionalOS#19/dimensionalOS#20/dimensionalOS#21 in FASTLIO2_LOG.md covering the rsdriver separate mode + discovery ordering work, LCM transport split, and the DIFOP rollout plan (codex-reviewed, both rounds). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Executed-By: dimos/crew/ace
…lOS#22/dimensionalOS#23 Three intertwined fixes from the DIFOP rollout test. 1. Mount matrix R_BASE_LIDAR_{FRONT,REAR} corrected. The old matrix collapsed the IMU-chip-to-lidar-housing rotation into the mount matrix, which worked pre-DIFOP (as R_base_from_IMU) but double-applied the rotation once DIFOP was layered on top — gravity landed on base -Y instead of +Z. User confirmed physical mount: horizontal, dome forward, cable exits upward through the top. That pins: lidar Z (dome) -> base +X (forward); lidar X (cable) -> base +Z (up); lidar Y -> base -Y (right, right-hand rule). Codex verified the cross-product, the Rz(180 deg) identity for rear, and the R_base_lidar * R_imu_to_lidar composition order. Live stationary now reads (~0.1, ~-0.07, +9.85) — clean +Z gravity, residuals within IMU noise floor, no bandaid bias subtraction. 2. M20_SKIP_STAND env flag on M20Connection. Set to 1 to skip the MotionState.STAND send (robot stays on the charging dock) while still sending UsageMode.NAVIGATION + GaitType.AGILE_FLAT — those are required, other usage modes can power off the lidars per M20 dev guide. stop() skips the mirror-image SIT too. 3. FASTLIO2_LOG Finding dimensionalOS#22 (mount matrix derivation) and Finding dimensionalOS#23 (two drdds_recv processes root cause + deploy.sh symlink loop bug — the old drdds-bridge.service was shadowing the deploy- provisioned drdds-recv.service; load avg 38, imu lag -0.7s, catastrophic drift. Disabling the old service dropped load to 1.2 and imu_vs_frame_end to -0.05s). 5-min stationary: 1 keyframe at origin, cal=OK throughout, imu_vs_frame_end bounded at ~70ms. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Executed-By: dimos/crew/ace
1. deploy.sh sync now picks the newest nix store output by mtime (via ls -dt … | head -1) instead of whichever hash sorted last alphabetically in the for-loop. That silent revert bit us twice today — rebuilt binary, synced, and the symlink landed on a stale pre-DIFOP store path. 2. deploy.sh provision now explicitly stops + disables the legacy drdds-bridge.service if present. The old rig provisioning installed it running the same drdds_recv binary that our own drdds-recv.service uses; leaving both enabled spawns two drdds_recv processes, saturates NOS (load avg 38), stalls IMU delivery, and causes catastrophic SLAM drift. See Finding dimensionalOS#23. 3. FASTLIO2_LOG Finding dimensionalOS#21 marked superseded by dimensionalOS#22 with a one-line pointer to the real resolution (the empirical mount matrix was R_base_from_IMU, not R_base_from_lidar). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Executed-By: dimos/crew/ace
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds full Deep Robotics M20 quadruped support to dimos, including autonomous navigation via ARISE SLAM running in a Docker container on the robot's NOS board.
What's included
/NAV_CMDfor navigation modecolor_imagestreaming, smart_nav visual overrides for terrain/costmap/trajectorydeploy.shfor remote M20 management (push/pull Docker images, bridge start/stop, service conflict resolution, NAT setup) with Tailscale support/NAV_CMDPublisher: pybind11 module using host drdds (bypasses rclpy glibc incompatibility on Ubuntu 20.04 NOS)drdds_recv(host) +ros2_pub(container) for lidar point cloud + IMU dataArchitecture
M20 Hardware Topology
The M20 has three compute boards connected via internal Ethernet (10.21.31.0/24). dimos runs on NOS.
graph TB subgraph M20["Deep Robotics M20"] subgraph AOS["AOS — Main Controller<br/>10.21.31.103"] A1[Motor control] A2[lio_perception] A3["rsdriver (lidar)"] A4["yesense (IMU)"] A5[RTSP camera] A6["WiFi AP (10.21.41.1)"] end subgraph GOS["GOS — Comms<br/>10.21.31.104"] G1[5G modem] G2[Tailscale VPN] G3[NAT gateway] end subgraph NOS["NOS — Compute<br/>10.21.31.106"] N1[Docker host] N2[dimos runtime] N3[drdds_recv] N4["Python 3.12 · 16GB RAM · 62GB eMMC"] end subgraph Sensors["Sensors"] S1["2x RoboSense RSAIRY 192-ch lidar<br/>front: 224.10.10.201 · back: 224.10.10.202"] S2["Yesense IMU (/IMU_YESENSE)"] S3["RTSP Camera (rtsp://AOS:8554/video1)"] end AOS --- ETH["Internal Ethernet<br/>10.21.31.0/24"] GOS --- ETH NOS --- ETH endSoftware Stack (NOS)
C++ navigation modules run inside a Docker container (ROS2 Humble, Ubuntu 22.04), while Python modules run on the NOS host (Ubuntu 20.04, Python 3.12). This split is necessary because:
graph TB subgraph NOS["NOS (10.21.31.106)"] subgraph Host["Host — Ubuntu 20.04, Python 3.12 via uv"] subgraph Coordinator["dimos ModuleCoordinator (m20_smartnav)"] MC["M20Connection<br/>· RTSP camera<br/>· /NAV_CMD (drdds)"] PGO["PGO<br/>· loop closure<br/>· iSAM2"] CMX["CmdVelMux<br/>· teleop + nav<br/>· velocity mux"] CTG["ClickToGoal<br/>· click-to-nav"] RR["RerunBridgeModule<br/>gRPC :9877"] WS["WebsocketVisModule<br/>Command Center :7779"] end end subgraph Docker["Docker Container 'dimos-nav' — ROS2 Humble, Ubuntu 22.04"] R2P["ros2_pub<br/>(SHM → ROS2 bridge)"] ARISE["ARISE SLAM<br/>· feature_extraction<br/>· laser_mapping<br/>· imu_preintegration"] TA["Terrain<br/>Analysis"] LP["Local Planner +<br/>Path Follower"] R2P --> ARISE --> TA --> LP end subgraph Bridge["drdds_recv — host process, root"] DR["Subscribes /LIDAR/POINTS + /IMU via drdds<br/>Writes POSIX SHM to /dev/shm"] end Host -- "RPC (DockerModuleProxy)" --> Docker DR -- "/dev/shm (--ipc host)" --> R2P endSensor Data Flow
End-to-end path from hardware sensors through ARISE SLAM to motor commands:
flowchart TB subgraph AOS["AOS"] RS["rsdriver<br/>/LIDAR/POINTS"] YS["yesense<br/>/IMU_YESENSE"] end subgraph NOS_Host["NOS Host"] RECV["drdds_recv<br/>(POSIX SHM)"] end subgraph Container["Docker Container"] PUB["ros2_pub"] FE["feature_extraction"] LM["laser_mapping"] IP["imu_preintegration"] TA2["TerrainAnalysis"] LP2["LocalPlanner"] PF["PathFollower"] RNAV["ROSNav Module<br/>(ROS2 → LCM bridge)"] end subgraph NOS_Modules["NOS Host Modules"] PGO2["PGO<br/>loop closure + global map"] CMX2["CmdVelMux<br/>teleop + nav merge"] M20C["M20Connection"] NCP["nav_cmd_pub<br/>(pybind11)"] end RS -- "drdds SHM" --> RECV YS -- "drdds SHM" --> RECV RECV -- "/dev/shm" --> PUB PUB -- "/bridge/LIDAR_POINTS" --> FE PUB -- "/bridge/IMU" --> IP FE --> LM IP --> LM LM -- "/state_estimation" --> RNAV LM -- "/registered_scan" --> RNAV LM --> TA2 --> LP2 --> PF PF -- "cmd_vel" --> RNAV RNAV -- "LCM /odometry" --> PGO2 RNAV -- "LCM /registered_scan" --> PGO2 RNAV -- "LCM /nav_cmd_vel" --> CMX2 CMX2 -- "LCM /cmd_vel" --> M20C M20C --> NCP NCP -- "drdds /NAV_CMD" --> AOS_Motor["AOS<br/>Motor Controller"]Startup Ordering (Critical)
drdds uses FastDDS Shared Memory transport for zero-copy IPC. The SHM discovery mechanism requires subscribers to start before publishers, otherwise the subscriber never discovers the publisher's SHM segments.
flowchart LR subgraph Order["Startup Sequence"] direction TB S1["1. drdds_recv starts<br/>Creates SHM segments"] S2["2. rsdriver restarts<br/>Discovers SHM, publishes lidar"] S3["3. yesense restarts<br/>Discovers SHM, publishes IMU"] S4["4. Docker container starts<br/>ros2_pub reads /dev/shm"] S5["5. ARISE SLAM activates<br/>Lifecycle: 5/10/15s timers"] S6["6. IMU init completes<br/>~200 samples for gravity cal"] S7["7. SLAM producing<br/>/state_estimation + /registered_scan"] S1 --> S2 --> S3 --> S4 --> S5 --> S6 --> S7 endRemote Access (Tailscale)
flowchart LR subgraph Mac["Developer Mac"] DV["dimos-viewer"] T1["localhost:9877"] T2["localhost:7779"] T3["localhost:3030"] end subgraph GOS_VPN["GOS (Tailscale)<br/>100.89.134.73"] TS["5G + VPN<br/>NAT gateway"] end subgraph NOS_Ports["NOS"] P1[":9877 Rerun gRPC"] P2[":7779 Command Center"] P3[":3030 RerunWebSocket"] end T1 -- "SSH tunnel" --> TS T2 -- "SSH tunnel" --> TS T3 -- "SSH tunnel" --> TS TS -- NAT --> P1 TS -- NAT --> P2 TS -- NAT --> P3 DV -- "--connect" --> T1 DV -- "--ws-url" --> T3Included documentation
This PR includes extensive investigation logs and planning documents that serve as critical context for understanding the M20 platform and the architectural decisions made.
Investigation logs
INVESTIGATION_LOG.mdplans/m20-rosnav-migration/ROSNAV_MIGRATION_LOG.mdplans/m20-rosnav-migration/ARISE_SLAM_LOG.mdundistortionAndscanregistration()is commented out in all forks (Finding no.8). Documents the IMU-triggered processing fix from SuperOdom (Finding no.11), 192-to-64 channel ring remapping (Finding no.2), QoS mismatch (Finding no.3), float32 time truncation (Finding no.4), and lifecycle activation race conditions on ARM64 (Finding no.7).Planning documents (
plans/m20-rosnav-migration/)00-discovery/01-scope/02-spec/03-plan/04-beads/05-drdds-bridge/Reference documentation
dimos/robot/deeprobotics/m20/docs/m20-official-software-development-guide.mdHardware details
Key files
dimos/robot/deeprobotics/m20/connection.pydimos/robot/deeprobotics/m20/blueprints/rosnav/m20_smartnav.pydimos/robot/deeprobotics/m20/rosnav_docker.pydimos/robot/deeprobotics/m20/velocity_controller_dds.pydimos/robot/deeprobotics/m20/docker/deploy.shdimos/robot/deeprobotics/protocol.pydimos/robot/deeprobotics/m20/docker/drdds_bridge/Test plan
/state_estimationand/registered_scanon M20 hardwarenav_cmd_pubpublishing/NAV_CMDvia native drdds🤖 Generated with Claude Code