[WIP] Merge TeFlow into codebase by Kin-Zhang · Pull Request #36 · KTH-RPL/OpenSceneFlow

Kin-Zhang · 2026-03-11T16:38:06Z

TeFlow got accepted by CVPR 2026. 🎉🎉🎉 Now I'm working on releasing the code.

Please check the progress in the forked teflow branch.

Once it's ready, I will merge it into the codebase with the updated README.

…ether. * reset fire class uage directly. * add save screenshot easily with multi-view. * sync view point through diff windows. * visual lidar center tf if set slc to True.

* Add teflowLoss into the codebase * update chamfer3D with CUDA stream-style batch busy compute. AI summary: - Added automatic collection of self-supervised loss function names in `src/lossfuncs/__init__.py`. - Improved documentation and structure of self-supervised loss functions in `src/lossfuncs/selfsupervise.py`. - Refactored loss calculation logic in `src/trainer.py` to support new self-supervised loss functions. - Introduced `ssl_loss_calculator` method for handling self-supervised losses. - Updated training step to differentiate between self-supervised and supervised loss calculations. - Enhanced error handling during training and validation steps to skip problematic batches.

update slurm and command for teflow

Kin-Zhang · 2026-03-12T08:47:15Z

assets/cuda/chamfer3D/__init__.py

+    def batched(self,
+                pc0_list: List[torch.Tensor],
+                pc1_list: List[torch.Tensor],
+                truncate_dist: float = -1) -> torch.Tensor:
+        """Parallel Chamfer loss via B CUDA streams.
+
+        Returns mean-over-samples: (1/B) * Σ_i [mean(dist0_i) + mean(dist1_i)].
+        ~1.14× faster than serial loop on RTX 3090 @ 88K pts/sample;
+        more importantly, keeps GPU busy with one sustained work block per frame.
+        """
+        B = len(pc0_list)
+        if B == 1:
+            return self.forward(pc0_list[0], pc1_list[0], truncate_dist)
+
+        streams  = self._ensure_streams(B)
+        main     = torch.cuda.current_stream()
+        per_loss: List[torch.Tensor] = [None] * B  # type: ignore[list-item]
+
+        for i in range(B):
+            streams[i].wait_stream(main)
+            with torch.cuda.stream(streams[i]):
+                d0, d1, _, _ = ChamferDis.apply(pc0_list[i].contiguous(),
+                                                 pc1_list[i].contiguous())
+                if truncate_dist <= 0:
+                    per_loss[i] = d0.mean() + d1.mean()
+                else:
+                    v0, v1 = d0 <= truncate_dist, d1 <= truncate_dist
+                    per_loss[i] = torch.nanmean(d0[v0]) + torch.nanmean(d1[v1])
+
+        for i in range(B):
+            main.wait_stream(streams[i])
+
+        return torch.stack(per_loss).mean()


Speed Performance: Stream CUDA vs For-loop

Quick demo benchmark (1 GPU, bz=8, 312 samples):

Stream CUDA: 1.14s/it → Epoch 1: 46%|███████▍ | 18/39 [00:20<00:23, 1.14s/it] For-loop: 1.29s/it → Epoch 1: 46%|███████▍ | 18/39 [00:23<00:27, 1.29s/it]

1.132× faster (~13.2% speedup)

Based on the previous full training run (8 GPUs, bz=16, 153,932 samples), this reduces self-supervised training time from 11 hours → ~9.5 hours on 8 gpus.

Kin-Zhang and others added 7 commits January 10, 2026 18:40

chore(visualization): refactor the open3d visualization, merge fn tog…

3ba12a7

…ether. * reset fire class uage directly. * add save screenshot easily with multi-view. * sync view point through diff windows. * visual lidar center tf if set slc to True.

fix(flow): add index_flow for 2hz gt view etc.

cff7ce8

Merge branch 'KTH-RPL:main' into main

b797eaf

hotfix: voteflow cuda lib skip compile if pre-install already.

fafe30e

Merge branch 'fixlib' into feature/teflow

f47d070

docs(apptainer): update apptainer env for diff cluster env.

52a8844

update slurm and command for teflow

Kin-Zhang mentioned this pull request Mar 11, 2026

when TeFlow code released #35

Open

Kin-Zhang added the new method new method involve label Mar 11, 2026

Kin-Zhang linked an issue Mar 11, 2026 that may be closed by this pull request

when TeFlow code released #35

Open

update train with rename jobid if it's self-supervised loss.

235b801

Kin-Zhang commented Mar 12, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Merge TeFlow into codebase#36

[WIP] Merge TeFlow into codebase#36
Kin-Zhang wants to merge 8 commits intoKTH-RPL:mainfrom
Kin-Zhang:feature/teflow

Kin-Zhang commented Mar 11, 2026

Uh oh!

Kin-Zhang Mar 12, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Kin-Zhang commented Mar 11, 2026

Uh oh!

Kin-Zhang Mar 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Speed Performance: Stream CUDA vs For-loop

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Kin-Zhang Mar 12, 2026 •

edited

Loading