Conversation
…nto jeff/fix/rosnav8
…nto jeff/fix/rosnav8
…dimos into jeff/fix/rosnav8
Twelve fixes from Paul-style code review of #1780: native_module.py: - _maybe_build: use did_change(update=False) + post-success update_cache so failed builds never poison the rebuild cache - delete stray "Source files changed" log line that fired on every call - stop(): per-instance _stop_lock; capture proc/watchdog refs under the lock, do signal/wait/join outside the lock to avoid the watchdog-self- join deadlock; second caller no-ops via _stopping - _read_log_stream: receive pid as a parameter instead of reaching into self._process.pid (TOCTOU with stop()) - _child_preexec_linux + ctypes hoisted to module scope under a sys.platform guard; no more re-import per start() - _clear_nix_executable: gracefully handle cwd=None (skip the walk entirely), use .resolve() comparison so a symlinked cwd still terminates the walk, refuse to walk past cwd's tree change_detect.py: - fold max_file_size into _hash_files digest so different thresholds against the same cache_name can't corrupt each other - new _locked_cache_file context manager — flock the .hash file directly in "a+" mode; no more orphan .lock sidecars accumulating in the cache dir; did_change/update_cache/clear_cache all share the helper Tests: - new test_should_rebuild_true_bypasses_change_check for the explicit "always rebuild" path - new test_failed_build_does_not_mark_cache_clean for the update=False retry semantics Build system: - tiledb darwin overlay: filter patches by name instead of dropping all of them, so a future security patch from nixpkgs survives - livox_sdk_config.hpp: honor $TMPDIR on Darwin instead of hardcoding /tmp Perf script: - compute cache_name inline (using the same inspect-based source_file pattern) instead of calling the inlined _build_cache_name method All 26 tests across test_change_detect.py + test_native_module.py + test_native_rebuild.py pass. ruff + mypy clean. mid360_native and fastlio2_native still build end-to-end on aarch64-darwin. Linux drvPaths for libpqxx/tiledb/pcl/vtk verified unchanged by the overlay.
| # (roof is at ~3 m+). | ||
| MAX_ALLOWED_Z = 2.0 | ||
|
|
||
| lcm_url = os.environ.get("LCM_DEFAULT_URL", "udpm://239.255.76.67:7667?ttl=0") |
There was a problem hiding this comment.
Use _DEFAULT_LCM_URL from lcmservice.py. No need to duplicate it here.
| if robot_z > max_z: | ||
| max_z = robot_z | ||
|
|
||
| lc.subscribe(ODOM_TOPIC, _odom_handler) |
There was a problem hiding this comment.
Need to unsubscribe.
| MAX_ALLOWED_Z = 2.0 | ||
|
|
||
| lcm_url = os.environ.get("LCM_DEFAULT_URL", "udpm://239.255.76.67:7667?ttl=0") | ||
| lc = lcmlib.LCM(lcm_url) |
There was a problem hiding this comment.
It's a 20000 line PR. You're not saving disk space by using lc instead of lcm. :)
| ) | ||
|
|
||
| finally: | ||
| print("\n[test-simple] Stopping blueprint…") |
There was a problem hiding this comment.
Please add in conventions to not add print() in tests.
| print(f"[test-simple] Odom online. Robot at ({robot_x:.2f}, {robot_y:.2f})") | ||
|
|
||
| print(f"[test-simple] Warming up for {WARMUP_SEC}s…") | ||
| time.sleep(WARMUP_SEC) |
There was a problem hiding this comment.
Sometimes sleep is unavoidable, but is there no way to pool for a condition than to wait 15 seconds every time?
There was a problem hiding this comment.
actually yeah, if your module start() has returned your module is assumed ready
| Test sequence: | ||
| p0 (-0.3, 2.5) — open corridor speed test | ||
| p1 (11.2, -1.8) — navigate with furniture | ||
| p2 ( 3.3, -4.9) — intermediate waypoint near doorway (explore lower area) | ||
| p3 ( 7.0, -5.0) — through the doorway into the right room | ||
| p4 (11.3, -5.6) — explore right room | ||
| p4→p1 (11.2, -1.8) — CRITICAL: must route through doorway, NOT wall |
There was a problem hiding this comment.
| Test sequence: | |
| p0 (-0.3, 2.5) — open corridor speed test | |
| p1 (11.2, -1.8) — navigate with furniture | |
| p2 ( 3.3, -4.9) — intermediate waypoint near doorway (explore lower area) | |
| p3 ( 7.0, -5.0) — through the doorway into the right room | |
| p4 (11.3, -5.6) — explore right room | |
| p4→p1 (11.2, -1.8) — CRITICAL: must route through doorway, NOT wall |
| import lcm as lcmlib | ||
| import pytest | ||
|
|
||
| os.environ.setdefault("DISPLAY", ":1") |
There was a problem hiding this comment.
Please do not change the system after the test. Why is this even needed?
| from dimos.visualization.vis_module import vis_module | ||
|
|
||
| # -- Clear stale nav paths from previous runs ------------------------- | ||
| paths_dir = Path(__file__).resolve().parents[3] / "data" / "smart_nav_paths" |
There was a problem hiding this comment.
Is this get_data("smart_nav_paths")? Why are you removing it? Everything in data should be static.
| for f in paths_dir.iterdir(): | ||
| f.unlink(missing_ok=True) | ||
|
|
||
| # -- Build blueprint (same composition as unitree_g1_nav_sim) ---------- |
There was a problem hiding this comment.
This does look pretty identical. So why not use it instead of copying it here?
| pytestmark = [pytest.mark.slow] | ||
|
|
||
|
|
||
| class TestCrossWallPlanning: |
There was a problem hiding this comment.
This code is pretty much identical to dimos/navigation/smart_nav/tests/test_cross_wall_planning_simple.py . Please extract to common parts (like the waypoint following) into fixtures.
I already have something similar as follow_points in dimos/e2e_tests/conftest.py. Can that not be used here?
Also, it might be better to move these tests to dimos/e2e_tests.
| ("result-tare-planner", "tare_planner"), | ||
| ] | ||
| _HAS_BINARIES = all((_NATIVE_DIR / d / "bin" / name).exists() for d, name in _REQUIRED_BINARIES) | ||
| _IS_LINUX_X86 = platform.system() == "Linux" and platform.machine() in ("x86_64", "AMD64") |
There was a problem hiding this comment.
You use _IS_LINUX_X86 in several places. Please move it to dimos/conftest.py as skipif_macos. But I don't understand why we can't build on Mac OS.
| registered_scan: Out[PointCloud2] | ||
| odometry: Out[Odometry] | ||
|
|
||
| def __init__(self, **kwargs): # type: ignore[no-untyped-def] |
There was a problem hiding this comment.
| def __init__(self, **kwargs): # type: ignore[no-untyped-def] | |
| def __init__(self, **kwargs): |
no typechecking in tests.
| def __getstate__(self) -> dict[str, Any]: | ||
| state = super().__getstate__() | ||
| state.pop("_cmd_lock", None) | ||
| state.pop("_sensor_thread", None) | ||
| state.pop("_sim_thread", None) | ||
| return state | ||
|
|
||
| def __setstate__(self, state: dict[str, Any]) -> None: | ||
| super().__setstate__(state) | ||
| self._cmd_lock = threading.Lock() | ||
| self._sensor_thread = None | ||
| self._sim_thread = None |
There was a problem hiding this comment.
Are you sure this is needed?
| @rpc | ||
| def start(self) -> None: | ||
| super().start() | ||
| self.cmd_vel._transport.subscribe(self._on_cmd_vel) |
|
|
||
| now = time.time() | ||
| quat = Quaternion.from_euler(Vector3(0.0, 0.0, self._yaw)) | ||
| self.odometry._transport.publish( |
| lock: threading.Lock = field(default_factory=threading.Lock) | ||
|
|
||
|
|
||
| # Test |
There was a problem hiding this comment.
| # Test |
| from dimos.core.coordination.blueprints import autoconnect | ||
| from dimos.core.coordination.module_coordinator import ModuleCoordinator | ||
| from dimos.msgs.geometry_msgs.PointStamped import PointStamped | ||
| from dimos.msgs.nav_msgs.Path import Path as NavPath | ||
| from dimos.navigation.smart_nav.modules.local_planner.local_planner import LocalPlanner | ||
| from dimos.navigation.smart_nav.modules.path_follower.path_follower import PathFollower | ||
| from dimos.navigation.smart_nav.modules.tare_planner.tare_planner import TarePlanner | ||
| from dimos.navigation.smart_nav.modules.terrain_analysis.terrain_analysis import TerrainAnalysis |
| tare = coordinator.get_instance(TarePlanner) | ||
| planner = coordinator.get_instance(LocalPlanner) | ||
| follower = coordinator.get_instance(PathFollower) | ||
| coordinator.get_instance(MockVehicle) | ||
| terrain = coordinator.get_instance(TerrainAnalysis) | ||
|
|
||
| def _on_wp(msg: PointStamped) -> None: | ||
| with collector.lock: | ||
| collector.waypoints.append((msg.x, msg.y, msg.z)) | ||
|
|
||
| def _on_terrain(msg: PointCloud2) -> None: | ||
| with collector.lock: | ||
| collector.terrain_maps.append(True) | ||
|
|
||
| def _on_path(msg: NavPath) -> None: | ||
| with collector.lock: | ||
| collector.paths.append(msg) | ||
|
|
||
| def _on_cmd(msg: Twist) -> None: | ||
| with collector.lock: | ||
| collector.cmd_vels.append((msg.linear.x, msg.linear.y, msg.angular.z)) | ||
|
|
||
| tare.way_point._transport.subscribe(_on_wp) | ||
| planner.path._transport.subscribe(_on_path) | ||
| follower.cmd_vel._transport.subscribe(_on_cmd) | ||
| terrain.terrain_map._transport.subscribe(_on_terrain) |
There was a problem hiding this comment.
This is the same pattern you're using in dimos/navigation/smart_nav/tests/test_waypoint_nav.py except you're using lock.acquire() instead of with lock.
You can have a fixture which starts the blueprint and records data. The test should use the fixture and assert that the data is what you expect.
|
|
||
|
|
||
| class PathFollower(NativeModule): | ||
| """Pure pursuit path follower with PID yaw control. |
There was a problem hiding this comment.
we shouldn't use pure pursuit for holonomic robots, just for an issue for later
|
|
||
| def _query_pose(self) -> tuple[float, float, float]: | ||
| """Return (x, y, z) from the TF tree, falling back to cached values.""" | ||
| for parent, child in self._TF_POSE_QUERIES: |
There was a problem hiding this comment.
why aren't you simply using self.tf.get ? what is this for? I see cache mentioned but tf service does offer built in caching
| return len(self._key_poses) | ||
|
|
||
|
|
||
| class PGO(Module): |
There was a problem hiding this comment.
why do we have DIY python only PGO? aren't we using some fastlio pgo stuff?
| return [s, int((ts - s) * 1_000_000_000)] | ||
|
|
||
|
|
||
| class GraphNodes3D(Timestamped): |
There was a problem hiding this comment.
dimos.msgs structure 100% matches ros msgs, you shouldn't add new custom classes to this nav_msgs dir but create a dimos.msgs.something subdir for your custom messages.
they can later be upgraded to be a part of our actual dimos-lcm repo for multi lang support if we choose to, but shouldn't pollute ROS namespace
| "rerun", | ||
| rerun_config={ | ||
| "visual_override": { | ||
| "world/lidar": lambda grid: grid.to_rerun(voxel_size=voxel_size, mode="boxes"), |
There was a problem hiding this comment.
You cannot use lambdas in blueprints because they can't be pickled to be sent to modules. Are these values not sent to modules?
You have to use functions with names because they can be serialized. (This worked in dask because dask serializes functions by bytecode.)
| pass | ||
| # Also grab addresses via DGRAM trick for interfaces without DNS | ||
| try: | ||
| import subprocess |
| super().stop() | ||
| # Subscribe to our own odometry output so we can mirror each | ||
| # pose update into the TF tree as an odom→body transform. | ||
| self.odometry.transport.subscribe(self._on_odom_for_tf, self.odometry) |
|
|
||
|
|
||
| class TestFarPlannerConfig: | ||
| """Test FarPlanner configuration.""" |
There was a problem hiding this comment.
More tests for values. Please remove.
I don't see any code in dimos/navigation/smart_nav/modules/far_planner/far_planner.py . There's nothing to test.
| @@ -0,0 +1,102 @@ | |||
| # Copyright 2026 Dimensional Inc. | |||
There was a problem hiding this comment.
Please remove this file too.
| logger.debug( | ||
| "Global map published", | ||
| points=len(cloud_np), | ||
| keyframes=pgo.num_key_poses, | ||
| ) |
There was a problem hiding this comment.
| logger.debug( | |
| "Global map published", | |
| points=len(cloud_np), | |
| keyframes=pgo.num_key_poses, | |
| ) |
| self.odometry.subscribe(self._on_odom) | ||
| self.registered_scan.subscribe(self._on_scan) |
| import uuid | ||
|
|
||
|
|
||
| def get_local_ips() -> list[tuple[str, str]]: |
There was a problem hiding this comment.
There's another _get_local_ips in dimos/hardware/sensors/lidar/fastlio2/module.py. Should this replace that? Also move the imports to the top.
Pre-requisite PRs
Will be perpetually re-testing hardware after merges/changes to ^
631af1etested for 2 hours on the g1 (entire battery! instantly responsive after 2hrs)c54578a044f892fe7f231b468f2f3512adf4282a(later) hardware tested and sim tested.Problem
Robot blueprints each duplicated viewer backend selection logic (foxglove vs rerun vs rerun-web vs rerun-connect), making it error-prone to add new backends or change behavior. The RerunBridge also lacked a WebSocket server for dimos-viewer keyboard/click events in --connect mode.
Solution
Breaking Changes
How to Test
Contributor License Agreement