(fix): Type-stable task-local pool accessors for CPU/CUDA/Metal#42
Merged
(fix): Type-stable task-local pool accessors for CPU/CUDA/Metal#42
Conversation
Return type assertions on get_task_local_*_pool() were missing type
parameters, causing type instability when retrieved from task-local
storage (IdDict{Any,Any}). Added concrete parametric types to all
return assertions and Dict value types. Added @inferred regression
tests for all three backends covering both fast path (existing pool)
and slow path (fresh task creation).
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #42 +/- ##
=======================================
Coverage 95.48% 95.48%
=======================================
Files 14 14
Lines 3232 3232
=======================================
Hits 3086 3086
Misses 146 146
🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Pull request overview
Fixes type instability in task-local pool accessors by asserting concrete parametric pool types (CPU/CUDA/Metal) and using concretely-typed per-device pool dictionaries, preventing boxing on pool access and preserving the library’s zero-allocation goals.
Changes:
- Add concrete parametric type assertions to
get_task_local_pool,get_task_local_cuda_pool, andget_task_local_metal_pool. - Make task-local per-device GPU pool dictionaries concretely typed by pool parameters.
- Add
@inferredregression tests for fast-path (cache hit) and slow-path (fresh Task) access for CPU/CUDA/Metal.
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
src/task_local_pool.jl |
Tightens return type assertion for CPU task-local pool to AdaptiveArrayPool{RUNTIME_CHECK}. |
ext/AdaptiveArrayPoolsCUDAExt/task_local_pool.jl |
Uses Dict{Int, CuAdaptiveArrayPool{RUNTIME_CHECK}} and asserts concrete return type for CUDA task-local pool. |
ext/AdaptiveArrayPoolsMetalExt/task_local_pool.jl |
Uses Dict{UInt64, MetalAdaptiveArrayPool{RUNTIME_CHECK, METAL_STORAGE}} and asserts concrete return type for Metal task-local pool. |
test/test_task_local_pool.jl |
Adds @inferred coverage for CPU task-local pool fast/slow paths. |
test/cuda/test_extension.jl |
Adds @inferred coverage for CUDA task-local pool and updates Dict isa checks to concrete parametric type. |
test/metal/test_task_local_pool.jl |
Adds @inferred coverage for Metal task-local pool and updates Dict isa checks to concrete parametric type. |
test/metal/runtests.jl |
Imports METAL_STORAGE from the extension for parametric type checks in tests. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
get_task_local_pool(),get_task_local_cuda_pool(), andget_task_local_metal_pool()were type-unstable because their return type assertions used bare abstract types without parametric type parameters:Since
task_local_storage()returnsIdDict{Any,Any}, the compiler cannot infer concrete return types without explicit parametric assertions. This caused runtime boxing on every pool access — directly undermining the zero-allocation goal.Root Cause
Julia's type system treats parametric types as invariant:
AdaptiveArrayPool{0} <: AdaptiveArrayPoolis true, but the compiler cannot narrow::AdaptiveArrayPoolto::AdaptiveArrayPool{0}without the explicit parameter. The same applied toDictvalue types used for multi-device GPU pools.Fix
Added concrete type parameters to all return assertions and
Dictvalue types:::AdaptiveArrayPool::AdaptiveArrayPool{RUNTIME_CHECK}::CuAdaptiveArrayPool::CuAdaptiveArrayPool{RUNTIME_CHECK}::MetalAdaptiveArrayPool::MetalAdaptiveArrayPool{RUNTIME_CHECK, METAL_STORAGE}Dict{Int, CuAdaptiveArrayPool}Dict{Int, CuAdaptiveArrayPool{RUNTIME_CHECK}}Dict{UInt64, MetalAdaptiveArrayPool}Dict{UInt64, MetalAdaptiveArrayPool{RUNTIME_CHECK, METAL_STORAGE}}Tests
Added
@inferredregression tests for all three backends, covering both code paths:Taskwhere pool must be created (Threads.@spawn)Changed Files
src/task_local_pool.jl— CPU return type assertionext/AdaptiveArrayPoolsCUDAExt/task_local_pool.jl— CUDA return + Dict typesext/AdaptiveArrayPoolsMetalExt/task_local_pool.jl— Metal return + Dict typestest/test_task_local_pool.jl— CPU@inferredteststest/cuda/test_extension.jl— CUDA@inferredtests +isafixestest/metal/test_task_local_pool.jl— Metal@inferredtests +isafixestest/metal/runtests.jl— importMETAL_STORAGEconst for tests