Open
Conversation
…ether. * reset fire class uage directly. * add save screenshot easily with multi-view. * sync view point through diff windows. * visual lidar center tf if set slc to True.
* Add teflowLoss into the codebase * update chamfer3D with CUDA stream-style batch busy compute. AI summary: - Added automatic collection of self-supervised loss function names in `src/lossfuncs/__init__.py`. - Improved documentation and structure of self-supervised loss functions in `src/lossfuncs/selfsupervise.py`. - Refactored loss calculation logic in `src/trainer.py` to support new self-supervised loss functions. - Introduced `ssl_loss_calculator` method for handling self-supervised losses. - Updated training step to differentiate between self-supervised and supervised loss calculations. - Enhanced error handling during training and validation steps to skip problematic batches.
update slurm and command for teflow
Kin-Zhang
commented
Mar 12, 2026
Comment on lines
+96
to
+128
| def batched(self, | ||
| pc0_list: List[torch.Tensor], | ||
| pc1_list: List[torch.Tensor], | ||
| truncate_dist: float = -1) -> torch.Tensor: | ||
| """Parallel Chamfer loss via B CUDA streams. | ||
|
|
||
| Returns mean-over-samples: (1/B) * Σ_i [mean(dist0_i) + mean(dist1_i)]. | ||
| ~1.14× faster than serial loop on RTX 3090 @ 88K pts/sample; | ||
| more importantly, keeps GPU busy with one sustained work block per frame. | ||
| """ | ||
| B = len(pc0_list) | ||
| if B == 1: | ||
| return self.forward(pc0_list[0], pc1_list[0], truncate_dist) | ||
|
|
||
| streams = self._ensure_streams(B) | ||
| main = torch.cuda.current_stream() | ||
| per_loss: List[torch.Tensor] = [None] * B # type: ignore[list-item] | ||
|
|
||
| for i in range(B): | ||
| streams[i].wait_stream(main) | ||
| with torch.cuda.stream(streams[i]): | ||
| d0, d1, _, _ = ChamferDis.apply(pc0_list[i].contiguous(), | ||
| pc1_list[i].contiguous()) | ||
| if truncate_dist <= 0: | ||
| per_loss[i] = d0.mean() + d1.mean() | ||
| else: | ||
| v0, v1 = d0 <= truncate_dist, d1 <= truncate_dist | ||
| per_loss[i] = torch.nanmean(d0[v0]) + torch.nanmean(d1[v1]) | ||
|
|
||
| for i in range(B): | ||
| main.wait_stream(streams[i]) | ||
|
|
||
| return torch.stack(per_loss).mean() |
Member
Author
There was a problem hiding this comment.
Speed Performance: Stream CUDA vs For-loop
Quick demo benchmark (1 GPU, bz=8, 312 samples):
Stream CUDA: 1.14s/it → Epoch 1: 46%|███████▍ | 18/39 [00:20<00:23, 1.14s/it]
For-loop: 1.29s/it → Epoch 1: 46%|███████▍ | 18/39 [00:23<00:27, 1.29s/it]
1.132× faster (~13.2% speedup)
Based on the previous full training run (8 GPUs, bz=16, 153,932 samples), this reduces self-supervised training time from 11 hours → ~9.5 hours on 8 gpus.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
TeFlow got accepted by CVPR 2026. 🎉🎉🎉 Now I'm working on releasing the code.
Please check the progress in the forked teflow branch.
Once it's ready, I will merge it into the codebase with the updated README.