Skip to content

feat(cache): add digest-based cache reuse and GC for models#31

Closed
chlins wants to merge 1 commit intomainfrom
feat/reuse
Closed

feat(cache): add digest-based cache reuse and GC for models#31
chlins wants to merge 1 commit intomainfrom
feat/reuse

Conversation

@chlins
Copy link
Copy Markdown
Member

@chlins chlins commented Apr 17, 2026

This pull request introduces a model cache and reuse mechanism to improve model mounting performance and reduce redundant downloads. It adds a cache directory structure, cache management with garbage collection (GC), and logic to reuse cached models via hardlinks when possible. The changes also enhance model status tracking with new fields.

Key changes include:

Model cache and reuse implementation

  • Introduced a cache directory structure in RawConfig with helper methods for cache path management, including methods like GetCacheDir, GetCacheSHA256Dir, GetCacheKey, and GetCacheModelDir (pkg/config/config.go).
  • Added a new pkg/service/reuse.go module with logic to resolve model digests, check for cached models, and hardlink cached model directories for reuse.
  • Updated Worker.PullModel to resolve the digest for a model reference, check for a cached model, and hardlink it into place if available, falling back to a direct pull if needed (pkg/service/worker.go).
  • Modified pullModel to accept and record the resolved digest and reuse status in the model status, and to support the new cache workflow (pkg/service/worker.go). [1] [2]

Cache management and garbage collection

  • Refactored CacheManager in pkg/service/cache.go to scan mounted models more cleanly, track active cache keys, and periodically remove unused cache directories older than a TTL (default 1 hour). Added GC and scan intervals as configuration variables. [1] [2] [3] [4] [5] [6]

Model status enhancements

  • Extended the status.Status struct with ResolvedDigest and Reused fields to track the resolved digest and whether the model was reused from cache (pkg/status/status.go).

These changes together enable efficient model sharing between mounts, reduce redundant downloads, and provide a foundation for future improvements in cache management and reporting.

@chlins chlins added the enhancement New feature or request label Apr 17, 2026
Signed-off-by: chlins <chlins.zhang@gmail.com>
@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 17, 2026

📊 Code Coverage Report

Metric Coverage Threshold Status
Overall 66.9% 70%
Changed lines 24% 90%
📦 Per-package breakdown
github.com/modelpack/model-csi-driver/pkg/client/grpc.go:104:                    80.0%
github.com/modelpack/model-csi-driver/pkg/client/grpc.go:116:                    80.0%
github.com/modelpack/model-csi-driver/pkg/client/grpc.go:31:                     85.7%
github.com/modelpack/model-csi-driver/pkg/client/grpc.go:60:                     75.0%
github.com/modelpack/model-csi-driver/pkg/client/grpc.go:69:                     80.0%
github.com/modelpack/model-csi-driver/pkg/client/grpc.go:81:                     80.0%
github.com/modelpack/model-csi-driver/pkg/client/grpc.go:92:                     80.0%
github.com/modelpack/model-csi-driver/pkg/client/http.go:105:                    100.0%
github.com/modelpack/model-csi-driver/pkg/client/http.go:23:                     80.0%
github.com/modelpack/model-csi-driver/pkg/client/http.go:49:                     74.3%
github.com/modelpack/model-csi-driver/pkg/client/request.go:12:                  100.0%
github.com/modelpack/model-csi-driver/pkg/client/request.go:34:                  75.0%
github.com/modelpack/model-csi-driver/pkg/client/request.go:50:                  66.7%
github.com/modelpack/model-csi-driver/pkg/client/request.go:65:                  75.0%
github.com/modelpack/model-csi-driver/pkg/config/auth/docker.go:112:             87.5%
github.com/modelpack/model-csi-driver/pkg/config/auth/docker.go:26:              100.0%
github.com/modelpack/model-csi-driver/pkg/config/auth/docker.go:32:              100.0%
github.com/modelpack/model-csi-driver/pkg/config/auth/docker.go:55:              100.0%
github.com/modelpack/model-csi-driver/pkg/config/auth/docker.go:66:              92.9%
github.com/modelpack/model-csi-driver/pkg/config/auth/docker.go:88:              92.3%
github.com/modelpack/model-csi-driver/pkg/config/auth/keychain.go:20:            100.0%
github.com/modelpack/model-csi-driver/pkg/config/auth/keychain.go:31:            100.0%
github.com/modelpack/model-csi-driver/pkg/config/config.go:103:                  100.0%
github.com/modelpack/model-csi-driver/pkg/config/config.go:108:                  100.0%
github.com/modelpack/model-csi-driver/pkg/config/config.go:113:                  0.0%
github.com/modelpack/model-csi-driver/pkg/config/config.go:118:                  0.0%
github.com/modelpack/model-csi-driver/pkg/config/config.go:122:                  0.0%
github.com/modelpack/model-csi-driver/pkg/config/config.go:127:                  0.0%
github.com/modelpack/model-csi-driver/pkg/config/config.go:132:                  100.0%
github.com/modelpack/model-csi-driver/pkg/config/config.go:137:                  100.0%
github.com/modelpack/model-csi-driver/pkg/config/config.go:142:                  100.0%
github.com/modelpack/model-csi-driver/pkg/config/config.go:147:                  100.0%
github.com/modelpack/model-csi-driver/pkg/config/config.go:152:                  100.0%
github.com/modelpack/model-csi-driver/pkg/config/config.go:157:                  100.0%
github.com/modelpack/model-csi-driver/pkg/config/config.go:162:                  100.0%
github.com/modelpack/model-csi-driver/pkg/config/config.go:167:                  100.0%
github.com/modelpack/model-csi-driver/pkg/config/config.go:171:                  100.0%
github.com/modelpack/model-csi-driver/pkg/config/config.go:175:                  100.0%
github.com/modelpack/model-csi-driver/pkg/config/config.go:179:                  61.0%
github.com/modelpack/model-csi-driver/pkg/config/config.go:18:                   87.5%
github.com/modelpack/model-csi-driver/pkg/config/config.go:256:                  83.3%
github.com/modelpack/model-csi-driver/pkg/config/config.go:269:                  100.0%
github.com/modelpack/model-csi-driver/pkg/config/config.go:277:                  100.0%
github.com/modelpack/model-csi-driver/pkg/config/config.go:281:                  100.0%
github.com/modelpack/model-csi-driver/pkg/config/config.go:71:                   100.0%
github.com/modelpack/model-csi-driver/pkg/config/config.go:75:                   100.0%
github.com/modelpack/model-csi-driver/pkg/config/config.go:79:                   100.0%
github.com/modelpack/model-csi-driver/pkg/config/config.go:83:                   100.0%
github.com/modelpack/model-csi-driver/pkg/config/config.go:87:                   100.0%
github.com/modelpack/model-csi-driver/pkg/config/config.go:91:                   100.0%
github.com/modelpack/model-csi-driver/pkg/config/config.go:95:                   100.0%
github.com/modelpack/model-csi-driver/pkg/config/config.go:99:                   100.0%
github.com/modelpack/model-csi-driver/pkg/config/watcher.go:13:                  65.2%
github.com/modelpack/model-csi-driver/pkg/logger/logger.go:19:                   100.0%
github.com/modelpack/model-csi-driver/pkg/logger/logger.go:29:                   100.0%
github.com/modelpack/model-csi-driver/pkg/logger/logger.go:41:                   100.0%
github.com/modelpack/model-csi-driver/pkg/metrics/mount_collector.go:21:         100.0%
github.com/modelpack/model-csi-driver/pkg/metrics/mount_collector.go:34:         100.0%
github.com/modelpack/model-csi-driver/pkg/metrics/mount_collector.go:38:         100.0%
github.com/modelpack/model-csi-driver/pkg/metrics/mount_collector.go:42:         83.3%
github.com/modelpack/model-csi-driver/pkg/metrics/registry.go:117:               100.0%
github.com/modelpack/model-csi-driver/pkg/metrics/registry.go:126:               100.0%
github.com/modelpack/model-csi-driver/pkg/metrics/registry.go:135:               100.0%
github.com/modelpack/model-csi-driver/pkg/metrics/registry.go:146:               100.0%
github.com/modelpack/model-csi-driver/pkg/metrics/registry.go:24:                100.0%
github.com/modelpack/model-csi-driver/pkg/metrics/serve.go:26:                   100.0%
github.com/modelpack/model-csi-driver/pkg/metrics/serve.go:37:                   90.9%
github.com/modelpack/model-csi-driver/pkg/metrics/serve.go:59:                   83.3%
github.com/modelpack/model-csi-driver/pkg/mounter/builder.go:39:                 100.0%
github.com/modelpack/model-csi-driver/pkg/mounter/builder.go:50:                 100.0%
github.com/modelpack/model-csi-driver/pkg/mounter/builder.go:54:                 100.0%
github.com/modelpack/model-csi-driver/pkg/mounter/builder.go:59:                 100.0%
github.com/modelpack/model-csi-driver/pkg/mounter/builder.go:64:                 100.0%
github.com/modelpack/model-csi-driver/pkg/mounter/builder.go:69:                 100.0%
github.com/modelpack/model-csi-driver/pkg/mounter/builder.go:74:                 100.0%
github.com/modelpack/model-csi-driver/pkg/mounter/builder.go:82:                 100.0%
github.com/modelpack/model-csi-driver/pkg/mounter/builder.go:88:                 80.0%
github.com/modelpack/model-csi-driver/pkg/mounter/mounter.go:15:                 100.0%
github.com/modelpack/model-csi-driver/pkg/mounter/mounter.go:26:                 66.7%
github.com/modelpack/model-csi-driver/pkg/mounter/mounter.go:37:                 90.9%
github.com/modelpack/model-csi-driver/pkg/mounter/mounter.go:57:                 71.4%
github.com/modelpack/model-csi-driver/pkg/mounter/mounter.go:81:                 83.3%
github.com/modelpack/model-csi-driver/pkg/provider/provider.go:15:               100.0%
github.com/modelpack/model-csi-driver/pkg/service/artifact.go:11:                100.0%
github.com/modelpack/model-csi-driver/pkg/service/cache.go:104:                  88.2%
github.com/modelpack/model-csi-driver/pkg/service/cache.go:138:                  71.4%
github.com/modelpack/model-csi-driver/pkg/service/cache.go:154:                  0.0%
github.com/modelpack/model-csi-driver/pkg/service/cache.go:167:                  0.0%
github.com/modelpack/model-csi-driver/pkg/service/cache.go:210:                  83.3%
github.com/modelpack/model-csi-driver/pkg/service/cache.go:41:                   69.7%
github.com/modelpack/model-csi-driver/pkg/service/cache.go:96:                   75.0%
github.com/modelpack/model-csi-driver/pkg/service/controller.go:101:             100.0%
github.com/modelpack/model-csi-driver/pkg/service/controller.go:115:             100.0%
github.com/modelpack/model-csi-driver/pkg/service/controller.go:122:             76.2%
github.com/modelpack/model-csi-driver/pkg/service/controller.go:157:             100.0%
github.com/modelpack/model-csi-driver/pkg/service/controller.go:164:             50.0%
github.com/modelpack/model-csi-driver/pkg/service/controller.go:189:             100.0%
github.com/modelpack/model-csi-driver/pkg/service/controller.go:196:             100.0%
github.com/modelpack/model-csi-driver/pkg/service/controller.go:203:             100.0%
github.com/modelpack/model-csi-driver/pkg/service/controller.go:210:             100.0%
github.com/modelpack/model-csi-driver/pkg/service/controller.go:217:             100.0%
github.com/modelpack/model-csi-driver/pkg/service/controller.go:22:              84.0%
github.com/modelpack/model-csi-driver/pkg/service/controller.go:249:             100.0%
github.com/modelpack/model-csi-driver/pkg/service/controller.go:256:             100.0%
github.com/modelpack/model-csi-driver/pkg/service/controller.go:263:             100.0%
github.com/modelpack/model-csi-driver/pkg/service/controller.go:270:             100.0%
github.com/modelpack/model-csi-driver/pkg/service/controller.go:62:              91.3%
github.com/modelpack/model-csi-driver/pkg/service/controller_local.go:143:       79.3%
github.com/modelpack/model-csi-driver/pkg/service/controller_local.go:184:       28.1%
github.com/modelpack/model-csi-driver/pkg/service/controller_local.go:25:        57.3%
github.com/modelpack/model-csi-driver/pkg/service/controller_remote.go:134:      0.0%
github.com/modelpack/model-csi-driver/pkg/service/controller_remote.go:199:      0.0%
github.com/modelpack/model-csi-driver/pkg/service/controller_remote.go:34:       100.0%
github.com/modelpack/model-csi-driver/pkg/service/controller_remote.go:46:       0.0%
github.com/modelpack/model-csi-driver/pkg/service/controller_remote.go:60:       0.0%
github.com/modelpack/model-csi-driver/pkg/service/dynamic_server.go:107:         66.7%
github.com/modelpack/model-csi-driver/pkg/service/dynamic_server.go:151:         71.4%
github.com/modelpack/model-csi-driver/pkg/service/dynamic_server.go:190:         88.9%
github.com/modelpack/model-csi-driver/pkg/service/dynamic_server.go:46:          100.0%
github.com/modelpack/model-csi-driver/pkg/service/dynamic_server.go:54:          82.4%
github.com/modelpack/model-csi-driver/pkg/service/dynamic_server.go:84:          83.3%
github.com/modelpack/model-csi-driver/pkg/service/dynamic_server_handler.go:124: 83.3%
github.com/modelpack/model-csi-driver/pkg/service/dynamic_server_handler.go:156: 90.9%
github.com/modelpack/model-csi-driver/pkg/service/dynamic_server_handler.go:185: 85.7%
github.com/modelpack/model-csi-driver/pkg/service/dynamic_server_handler.go:203: 100.0%
github.com/modelpack/model-csi-driver/pkg/service/dynamic_server_handler.go:27:  83.3%
github.com/modelpack/model-csi-driver/pkg/service/dynamic_server_handler.go:38:  100.0%
github.com/modelpack/model-csi-driver/pkg/service/dynamic_server_handler.go:56:  80.0%
github.com/modelpack/model-csi-driver/pkg/service/identity.go:21:                100.0%
github.com/modelpack/model-csi-driver/pkg/service/identity.go:41:                100.0%
github.com/modelpack/model-csi-driver/pkg/service/identity.go:9:                 100.0%
github.com/modelpack/model-csi-driver/pkg/service/kube.go:14:                    0.0%
github.com/modelpack/model-csi-driver/pkg/service/kube.go:25:                    0.0%
github.com/modelpack/model-csi-driver/pkg/service/kube.go:34:                    0.0%
github.com/modelpack/model-csi-driver/pkg/service/model.go:114:                  89.5%
github.com/modelpack/model-csi-driver/pkg/service/model.go:157:                  91.7%
github.com/modelpack/model-csi-driver/pkg/service/model.go:177:                  100.0%
github.com/modelpack/model-csi-driver/pkg/service/model.go:27:                   80.0%
github.com/modelpack/model-csi-driver/pkg/service/model.go:48:                   100.0%
github.com/modelpack/model-csi-driver/pkg/service/model.go:60:                   100.0%
github.com/modelpack/model-csi-driver/pkg/service/model.go:68:                   100.0%
github.com/modelpack/model-csi-driver/pkg/service/model.go:96:                   100.0%
github.com/modelpack/model-csi-driver/pkg/service/node.go:123:                   94.4%
github.com/modelpack/model-csi-driver/pkg/service/node.go:154:                   66.7%
github.com/modelpack/model-csi-driver/pkg/service/node.go:202:                   94.4%
github.com/modelpack/model-csi-driver/pkg/service/node.go:233:                   100.0%
github.com/modelpack/model-csi-driver/pkg/service/node.go:241:                   100.0%
github.com/modelpack/model-csi-driver/pkg/service/node.go:249:                   100.0%
github.com/modelpack/model-csi-driver/pkg/service/node.go:269:                   100.0%
github.com/modelpack/model-csi-driver/pkg/service/node.go:26:                    100.0%
github.com/modelpack/model-csi-driver/pkg/service/node.go:34:                    100.0%
github.com/modelpack/model-csi-driver/pkg/service/node.go:42:                    100.0%
github.com/modelpack/model-csi-driver/pkg/service/node.go:46:                    100.0%
github.com/modelpack/model-csi-driver/pkg/service/node.go:50:                    48.8%
github.com/modelpack/model-csi-driver/pkg/service/node_dynamic.go:18:            73.3%
github.com/modelpack/model-csi-driver/pkg/service/node_dynamic.go:52:            68.4%
github.com/modelpack/model-csi-driver/pkg/service/node_static.go:16:             81.8%
github.com/modelpack/model-csi-driver/pkg/service/node_static.go:42:             69.2%
github.com/modelpack/model-csi-driver/pkg/service/node_static_inline.go:18:      0.0%
github.com/modelpack/model-csi-driver/pkg/service/node_static_inline.go:54:      57.1%
github.com/modelpack/model-csi-driver/pkg/service/puller.go:42:                  26.9%
github.com/modelpack/model-csi-driver/pkg/service/quota.go:21:                   94.1%
github.com/modelpack/model-csi-driver/pkg/service/quota.go:49:                   100.0%
github.com/modelpack/model-csi-driver/pkg/service/quota.go:55:                   100.0%
github.com/modelpack/model-csi-driver/pkg/service/quota.go:67:                   84.2%
github.com/modelpack/model-csi-driver/pkg/service/reuse.go:14:                   18.8%
github.com/modelpack/model-csi-driver/pkg/service/reuse.go:46:                   0.0%
github.com/modelpack/model-csi-driver/pkg/service/reuse.go:57:                   0.0%
github.com/modelpack/model-csi-driver/pkg/service/reuse.go:92:                   0.0%
github.com/modelpack/model-csi-driver/pkg/service/service.go:41:                 100.0%
github.com/modelpack/model-csi-driver/pkg/service/service.go:45:                 0.0%
github.com/modelpack/model-csi-driver/pkg/service/worker.go:112:                 100.0%
github.com/modelpack/model-csi-driver/pkg/service/worker.go:121:                 21.3%
github.com/modelpack/model-csi-driver/pkg/service/worker.go:233:                 71.7%
github.com/modelpack/model-csi-driver/pkg/service/worker.go:28:                  100.0%
github.com/modelpack/model-csi-driver/pkg/service/worker.go:324:                 77.1%
github.com/modelpack/model-csi-driver/pkg/service/worker.go:34:                  100.0%
github.com/modelpack/model-csi-driver/pkg/service/worker.go:46:                  100.0%
github.com/modelpack/model-csi-driver/pkg/service/worker.go:62:                  100.0%
github.com/modelpack/model-csi-driver/pkg/service/worker.go:73:                  77.3%
github.com/modelpack/model-csi-driver/pkg/status/hook.go:100:                    100.0%
github.com/modelpack/model-csi-driver/pkg/status/hook.go:107:                    86.7%
github.com/modelpack/model-csi-driver/pkg/status/hook.go:139:                    100.0%
github.com/modelpack/model-csi-driver/pkg/status/hook.go:174:                    88.9%
github.com/modelpack/model-csi-driver/pkg/status/hook.go:195:                    100.0%
github.com/modelpack/model-csi-driver/pkg/status/hook.go:28:                     100.0%
github.com/modelpack/model-csi-driver/pkg/status/hook.go:34:                     100.0%
github.com/modelpack/model-csi-driver/pkg/status/hook.go:46:                     100.0%
github.com/modelpack/model-csi-driver/pkg/status/hook.go:53:                     100.0%
github.com/modelpack/model-csi-driver/pkg/status/hook.go:69:                     100.0%
github.com/modelpack/model-csi-driver/pkg/status/hook.go:76:                     80.0%
github.com/modelpack/model-csi-driver/pkg/status/hook.go:87:                     80.0%
github.com/modelpack/model-csi-driver/pkg/status/status.go:112:                  87.5%
github.com/modelpack/model-csi-driver/pkg/status/status.go:127:                  83.3%
github.com/modelpack/model-csi-driver/pkg/status/status.go:138:                  100.0%
github.com/modelpack/model-csi-driver/pkg/status/status.go:51:                   75.0%
github.com/modelpack/model-csi-driver/pkg/status/status.go:70:                   100.0%
github.com/modelpack/model-csi-driver/pkg/status/status.go:76:                   66.7%
github.com/modelpack/model-csi-driver/pkg/status/status.go:94:                   100.0%
github.com/modelpack/model-csi-driver/pkg/tracing/tracing.go:22:                 85.7%
github.com/modelpack/model-csi-driver/pkg/tracing/tracing.go:36:                 55.6%
github.com/modelpack/model-csi-driver/pkg/tracing/tracing.go:72:                 100.0%
github.com/modelpack/model-csi-driver/pkg/tracing/tracing.go:79:                 81.8%
github.com/modelpack/model-csi-driver/pkg/utils/utils.go:16:                     100.0%
github.com/modelpack/model-csi-driver/pkg/utils/utils.go:34:                     75.0%
github.com/modelpack/model-csi-driver/pkg/utils/utils.go:54:                     81.8%

total:											(statements)				66.9%

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a local caching mechanism for model files to optimize pull operations. It includes logic to resolve model digests, store models in a centralized cache directory using hardlinks to save space, and a background garbage collection process to remove expired entries. A critical issue was identified in the PullModel implementation where the use of singleflight causes concurrent requests for the same model to fail for all but the first caller due to volume-specific side effects being trapped inside the shared execution block. Additionally, the garbage collection logic lacks awareness of inflight pull operations, which could lead to premature deletion of cache entries.

Comment thread pkg/service/worker.go
Comment on lines +159 to +221
_, err, _ = worker.inflight.Do("cache-"+resolvedDigest, func() (interface{}, error) {
sourceModelDir, found, err := worker.getCachedModelDir(resolvedDigest)
if err != nil {
return nil, err
}
if !found {
cacheTmpModelDir := cacheModelDir + ".pulling"
if err := os.MkdirAll(filepath.Dir(cacheModelDir), 0755); err != nil {
return nil, errors.Wrapf(err, "create cache dir: %s", cacheModelDir)
}
if err := os.RemoveAll(cacheTmpModelDir); err != nil {
return nil, errors.Wrapf(err, "cleanup temporary cache model dir: %s", cacheTmpModelDir)
}
err = worker.pullModel(
ctx,
statusPath,
volumeName,
mountID,
reference,
resolvedDigest,
cacheTmpModelDir,
checkDiskQuota,
excludeModelWeights,
excludeFilePatterns,
)
if err != nil {
_ = os.RemoveAll(cacheTmpModelDir)
return nil, err
}
if err := os.RemoveAll(cacheModelDir); err != nil {
_ = os.RemoveAll(cacheTmpModelDir)
return nil, errors.Wrapf(err, "cleanup cache model dir before rename: %s", cacheModelDir)
}
if err := os.Rename(cacheTmpModelDir, cacheModelDir); err != nil {
_ = os.RemoveAll(cacheTmpModelDir)
return nil, errors.Wrapf(err, "rename cache model dir to %s", cacheModelDir)
}
sourceModelDir = cacheModelDir
}

if err := os.MkdirAll(filepath.Dir(modelDir), 0755); err != nil {
return nil, errors.Wrapf(err, "create mount dir for model: %s", modelDir)
}
if err := linkModelDir(sourceModelDir, modelDir); err != nil {
return nil, err
}
_, err = worker.sm.Set(statusPath, status.Status{
VolumeName: volumeName,
MountID: mountID,
Reference: reference,
ResolvedDigest: resolvedDigest,
Reused: found,
State: status.StatePullSucceeded,
})
if err != nil {
return nil, errors.Wrapf(err, "set model status after linking from cache")
}
logger.WithContext(ctx).Infof(
"linked model from cache: reference=%s digest=%s source=%s target=%s reused=%t",
reference, resolvedDigest, sourceModelDir, modelDir, found,
)
return nil, nil
})
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

There is a critical bug in the use of singleflight.Do here. The singleflight key is based on the resolvedDigest, which is shared across all volumes pulling the same model. However, the function inside Do performs a side effect—linking the model to a volume-specific modelDir. If multiple volumes request the same digest concurrently, only the first one will execute the linking logic for its specific directory. Subsequent callers will receive the result of the first call but their own modelDir will remain empty.

Additionally, the cache path in PullModel lacks a kmutex lock for the volume, which could lead to race conditions if multiple operations are performed on the same volume concurrently.

To fix this, the logic that ensures the model is in the cache should be separated from the logic that links the cached model to the volume directory.

cacheKey := "cache-" + resolvedDigest
	_, err, _ = worker.inflight.Do(cacheKey, func() (interface{}, error) {
		sourceModelDir, found, err := worker.getCachedModelDir(resolvedDigest)
		if err != nil {
			return nil, err
		}
		if found {
			return nil, nil
		}

		cacheTmpModelDir := cacheModelDir + ".pulling"
		if err := os.MkdirAll(filepath.Dir(cacheModelDir), 0755); err != nil {
			return nil, errors.Wrapf(err, "create cache dir: %s", cacheModelDir)
		}
		if err := os.RemoveAll(cacheTmpModelDir); err != nil {
			return nil, errors.Wrapf(err, "cleanup temporary cache model dir: %s", cacheTmpModelDir)
		}

		err = worker.pullModel(
			ctx,
			statusPath,
			volumeName,
			mountID,
			reference,
			resolvedDigest,
			cacheTmpModelDir,
			checkDiskQuota,
			excludeModelWeights,
			excludeFilePatterns,
		)
		if err != nil {
			_ = os.RemoveAll(cacheTmpModelDir)
			return nil, err
		}

		if err := os.RemoveAll(cacheModelDir); err != nil {
			_ = os.RemoveAll(cacheTmpModelDir)
			return nil, errors.Wrapf(err, "cleanup cache model dir before rename: %s", cacheModelDir)
		}
		if err := os.Rename(cacheTmpModelDir, cacheModelDir); err != nil {
			_ = os.RemoveAll(cacheTmpModelDir)
			return nil, errors.Wrapf(err, "rename cache model dir to %s", cacheModelDir)
		}
		return nil, nil
	})
	if err != nil {
		metrics.NodeOpObserve("pull_image", start, err)
		if !errors.Is(err, ErrConflict) {
			_ = worker.DeleteModel(ctx, isStaticVolume, volumeName, mountID)
		}
		return err
	}

	sourceModelDir, found, err := worker.getCachedModelDir(resolvedDigest)
	if err != nil {
		return err
	}

	if err := os.MkdirAll(filepath.Dir(modelDir), 0755); err != nil {
		return errors.Wrapf(err, "create mount dir for model: %s", modelDir)
	}
	if err := linkModelDir(sourceModelDir, modelDir); err != nil {
		return err
	}

	_, err = worker.sm.Set(statusPath, status.Status{
		VolumeName:     volumeName,
		MountID:        mountID,
		Reference:      reference,
		ResolvedDigest: resolvedDigest,
		Reused:         found,
		State:          status.StatePullSucceeded,
	})
	if err != nil {
		return errors.Wrapf(err, "set model status after linking from cache")
	}

	logger.WithContext(ctx).Infof(
		"linked model from cache: reference=%s digest=%s source=%s target=%s reused=%t",
		reference, resolvedDigest, sourceModelDir, modelDir, found,
	)

Comment thread pkg/service/cache.go
Comment on lines +167 to +208
func (cm *CacheManager) gc() error {
active, err := cm.activeCacheKeys()
if err != nil {
return err
}

cacheRoot := cm.cfg.Get().GetCacheSHA256Dir()
entries, err := os.ReadDir(cacheRoot)
if err != nil {
if os.IsNotExist(err) {
return nil
}
return errors.Wrapf(err, "read cache root: %s", cacheRoot)
}

deadline := time.Now().Add(-CacheTTL)
for _, entry := range entries {
if !entry.IsDir() {
continue
}
name := entry.Name()
if _, ok := active[name]; ok {
continue
}
info, err := entry.Info()
if err != nil {
logger.Logger().WithError(err).Warnf("stat cache dir: %s", name)
continue
}
if info.ModTime().After(deadline) {
continue
}
dirPath := filepath.Join(cacheRoot, name)
if err := os.RemoveAll(dirPath); err != nil {
logger.Logger().WithError(err).Warnf("remove cache dir: %s", dirPath)
continue
}
logger.Logger().Infof("removed expired cache dir: %s", dirPath)
}

return nil
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The garbage collection logic has a potential race condition. activeCacheKeys is calculated by scanning mounted volumes, but a PullModel operation for a new volume might be in progress and not yet reflected in the status files. If the GC runs at that moment and the cache entry is older than CacheTTL, it could be deleted while PullModel is attempting to link from it. While the 1-hour TTL provides a safety buffer, it's better to ensure that entries currently being pulled or linked are also considered active.

@chlins chlins closed this Apr 17, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant