fix(sdk): regenerate service.instance.id post-fork in MeterProvider and TracerProvider#5000
Conversation
| This hook runs post-fork in each worker and replaces service.instance.id | ||
| with a fresh UUID, ensuring each worker is a distinct instance. | ||
| """ | ||
| self._sdk_config.resource = self._sdk_config.resource.merge( |
There was a problem hiding this comment.
We are not setting this in the resource attributes, also not sure the issue is metrics specific.
There was a problem hiding this comment.
We are not setting this in the resource attributes
Not sure to get your point but if the user didn't set it, workers would be completely indistinguishable in the backend after fork, which is worse than having no ID at all. By always generating one post-fork we ensure every worker has a distinct identity regardless of whether the user configured it upfront.
On the second point: agreed, the problem affects both metrics and traces 👍🏼
There was a problem hiding this comment.
My point is that if we have a solution that work fine after forks() since the semantic conventions is now stable we can add this to the default resource detector. This way we have the very same attributes before and after fork.
b736894 to
873ff54
Compare
…nd TracerProvider When gunicorn (or any prefork server) forks workers, all workers inherit the same Resource from the master process, including the same service.instance.id. The SDK already restarts background threads post-fork (PeriodicExportingMetricReader, BatchProcessor) but never updates the resource identity. This causes metric collisions in OTLP backends where multiple workers exporting with the same resource identity result in incorrect aggregation instead of correct summation. Register an os.register_at_fork(after_in_child=...) hook on both MeterProvider and TracerProvider that replaces service.instance.id with a fresh UUID in each forked worker, ensuring distinct resource identities without any user configuration. Resource.merge() is used so all other resource attributes are preserved. WeakMethod is used for the hook reference, consistent with the existing pattern in PeriodicExportingMetricReader and BatchProcessor. Fixes: open-telemetry#4390 Related: open-telemetry#3885
63676db to
70b6fff
Compare
Description
When a prefork server (e.g. gunicorn) forks workers, all workers inherit the same
Resourcefrom the master process — including the sameservice.instance.id. The SDK already restarts background threads post-fork (PeriodicExportingMetricReader,BatchSpanProcessor) but never updates the resource identity. This causes metric and trace collisions in OTLP backends where multiple workers exporting with the same resource identity result in incorrect aggregation (last-write-wins) instead of correct summation.This PR registers an
os.register_at_fork(after_in_child=...)hook on bothMeterProviderandTracerProviderthat replacesservice.instance.idwith a fresh UUID in each forked worker. All other resource attributes are preserved viaResource.merge().WeakMethodis used for the hook reference, consistent with the existing pattern inPeriodicExportingMetricReaderandBatchSpanProcessor.Fixes #4390
Related: #3885
Type of change
Please delete options that are not relevant.
How Has This Been Tested?
Two new test files covering both providers:
tests/metrics/test_meter_provider_fork.pytests/trace/test_tracer_provider_fork.pyEach covers:
_at_fork_reinitassigns a newservice.instance.idOther resource attributes are preserved
A real fork() produces a distinct ID in the child vs the parent
4 concurrent forks each produce a unique ID
Run with:
pytest opentelemetry-sdk/tests/metrics/test_meter_provider_fork.pypytest opentelemetry-sdk/tests/trace/test_tracer_provider_fork.pyDoes This PR Require a Contrib Repo Change?
Checklist: