Skip to content

fix: cleanup orphaned deploys when services are renamed or removed from config#140

Open
vigneshrajsb wants to merge 2 commits intomainfrom
feat/cleanup-orphaned-services
Open

fix: cleanup orphaned deploys when services are renamed or removed from config#140
vigneshrajsb wants to merge 2 commits intomainfrom
feat/cleanup-orphaned-services

Conversation

@vigneshrajsb
Copy link
Contributor

@vigneshrajsb vigneshrajsb commented Mar 23, 2026

Problem

When a service is renamed or removed from lifecycle.yaml, the old service's Kubernetes resources (deployments, services, ingress) and DB deploy records were left behind indefinitely. This caused:

  • Resource leaks accumulating in ephemeral environments over time
  • Stale deploy records in the database that no longer correspond to active config
  • Potential conflicts if a new service was later assigned a name that previously existed

Fix

Orphan detection is now run after every upsertDeployablesWithDatabase() call. Any Deployable present in the database for the current build that is no longer referenced in the incoming config is identified as orphaned and enqueued for async teardown.

Changes

  • src/shared/config.ts — Added DEPLOY_CLEANUP queue name constant
  • src/server/services/deploy.ts — Added deployCleanupQueue, enqueueDeployCleanup(), and processDeployCleanupQueue() to DeployService. The processor deletes Kubernetes resources via kubectl delete (for Docker/GitHub deploys) or helm uninstall (for Helm deploys), then marks the deploy record inactive in the DB
  • src/server/jobs/index.ts — Registered the DEPLOY_CLEANUP BullMQ worker
  • src/server/services/deployable.ts — Added cleanupOrphanedDeploys() which queries existing deploys for the build, diffs against the current set of upserted deployables, and calls enqueueDeployCleanup() for each orphan. When filterGithubRepositoryId is set (push webhook for a specific repo), the orphan query is scoped to that repo only — preventing services from other repos being falsely flagged as orphans on a filtered run
  • src/server/services/__tests__/deployCleanup.test.ts — Unit tests covering orphan detection logic and the cleanup worker (skipping already-torn-down deploys, Helm uninstall path, kubectl delete path, DB record deactivation, and filtered repo scoping)

Bug Fix (same PR)

Orphan cleanup incorrectly tore down valid deploys on filtered builds. When a push webhook fires for a single repo, upsertDeployables only processes services for that repo (remote services from other repos are intentionally skipped). The original cleanupOrphanedDeploys compared against all deployables in the DB, causing services from non-triggering repos to appear as orphans. Fixed by scoping the orphan check to repositoryId = filterGithubRepositoryId when the filter is active.

Testing

Unit tests added in deployCleanup.test.ts. Live DB/Redis integration tests were intentionally skipped per project convention for this PR — the cleanup path can be exercised manually by renaming a service in lifecycle.yaml and pushing a new commit against an active ephemeral environment.

🤖 Generated with Claude Code

…om config

Adds a DEPLOY_CLEANUP async queue and worker that tears down Kubernetes/Helm
resources and marks DB deploy records inactive for services no longer present
in lifecycle.yaml. Orphan detection runs after every upsertDeployablesWithDatabase
call so renames and removals are caught on the next config push.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@vigneshrajsb vigneshrajsb requested a review from a team as a code owner March 23, 2026 03:02
… state after patch

When a service rename is reverted, an existing TORN_DOWN deploy was being
reused without resetting its status, causing it to be skipped during
re-deployment. Reset status to PENDING and active to true when a TORN_DOWN
deploy is matched to an active deployable.

After patching an existing deployable, refresh the in-memory object via
$query() so the returned deployables array reflects the latest DB state
rather than the pre-patch snapshot.

Also add jest fallback env vars so tests don't fail at module load time,
and expand deployCleanup test suite with mocks for nativeBuild and
LogArchivalService.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant