Conversation
📝 WalkthroughWalkthroughAdds Perses deployment and dashboard provisioning to Tilt: a Helm chart and helm_resource for Changes
Sequence Diagram(s)mermaid Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (2)
Tiltfile (2)
291-299: Consider addingcortex-prometheustoresource_deps.The Perses datasource is configured to connect to
prometheus-operated:9090. While the deployment will eventually work (Perses will retry), adding an explicit dependency would ensure proper startup ordering in the dev environment.Suggested change
helm_resource( 'cortex-perses', 'perses/perses', flags=['--values=./tools/perses/values.yaml'], port_forwards=[port_forward(5080, 8080, name='perses')], links=[link('http://localhost:5080', 'perses dashboard')], labels=['Monitoring'], - resource_deps=['perses'], + resource_deps=['perses', 'cortex-prometheus'], )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@Tiltfile` around lines 291 - 299, The Perses helm_resource ('cortex-perses') may start before Prometheus and its Service (prometheus-operated:9090) is available; update the helm_resource call for 'cortex-perses' to include 'cortex-prometheus' in the resource_deps list so Tilt waits for the Prometheus resource before creating Perses, i.e., add 'cortex-prometheus' alongside the existing 'perses' entry in resource_deps of the helm_resource invocation.
300-306: ConfigMap creation approach works but is unconventional.The piped
kubectlcommand correctly creates a ConfigMap with theperses.dev/resource=truelabel for sidecar discovery. Thewatch_fileensures Tilt rebuilds when dashboards change.One minor note: this ConfigMap won't appear as a labeled resource in Tilt's UI since there's no
k8s_resource()call for it. If you want it visible under "Monitoring", you could add:k8s_resource( new_name='cortex-perses-dashboards', objects=['cortex-perses-dashboards:ConfigMap:default'], labels=['Monitoring'], )This is optional and depends on whether you want the ConfigMap visible in Tilt's resource list.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@Tiltfile` around lines 300 - 306, The generated ConfigMap created via watch_file('./tools/perses/dashboards') + k8s_yaml(...) is correct but won’t show up in Tilt’s UI; add a k8s_resource declaration to surface it in the Tilt resource list. Add a k8s_resource call that references the ConfigMap name (cortex-perses-dashboards) and type (ConfigMap) and assigns it to the desired Tilt group/label (e.g., Monitoring) using the new_name and objects parameters so Tilt will display it as a resource.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@Tiltfile`:
- Around line 291-299: The Perses helm_resource ('cortex-perses') may start
before Prometheus and its Service (prometheus-operated:9090) is available;
update the helm_resource call for 'cortex-perses' to include 'cortex-prometheus'
in the resource_deps list so Tilt waits for the Prometheus resource before
creating Perses, i.e., add 'cortex-prometheus' alongside the existing 'perses'
entry in resource_deps of the helm_resource invocation.
- Around line 300-306: The generated ConfigMap created via
watch_file('./tools/perses/dashboards') + k8s_yaml(...) is correct but won’t
show up in Tilt’s UI; add a k8s_resource declaration to surface it in the Tilt
resource list. Add a k8s_resource call that references the ConfigMap name
(cortex-perses-dashboards) and type (ConfigMap) and assigns it to the desired
Tilt group/label (e.g., Monitoring) using the new_name and objects parameters so
Tilt will display it as a resource.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: f95f27ff-7047-43d1-9be6-8696e1a60c99
📒 Files selected for processing (3)
Tiltfiletools/perses/dashboards/project.jsontools/perses/values.yaml
Test Coverage ReportTest Coverage 📊: 68.6% |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
tools/perses/dashboards/cortex-compute-kvm-overview.json (1)
376-389: Use the total-capacity metric for the host count card.This panel is the odd one out: it counts hosts from a
type="utilized"series, while the capacity totals come fromcortex_kvm_host_capacity_total. Switching the count to the total metric keeps the card stable even when a host has no current workload sample.♻️ Suggested query
- "query": "count(max(cortex_kvm_host_capacity_usage{resource=\"cpu\", type=\"utilized\", availability_zone=~\"$availability_zone\", workload_type=~\"$workload_type\", cpu_architecture=~\"$cpu_architecture\", building_block=~\"$building_block\", compute_host=~\"$compute_host\", enabled=~\"$enabled\", maintenance=~\"$maintenance\", decommissioned=~\"$decommissioned\", external_customer=~\"$external_customer\"}) by (compute_host))", + "query": "count(max(cortex_kvm_host_capacity_total{resource=\"cpu\", availability_zone=~\"$availability_zone\", workload_type=~\"$workload_type\", cpu_architecture=~\"$cpu_architecture\", building_block=~\"$building_block\", compute_host=~\"$compute_host\", enabled=~\"$enabled\", maintenance=~\"$maintenance\", decommissioned=~\"$decommissioned\", external_customer=~\"$external_customer\"}) by (compute_host))",🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tools/perses/dashboards/cortex-compute-kvm-overview.json` around lines 376 - 389, The host-count query in the TimeSeriesQuery (PrometheusTimeSeriesQuery spec under "queries") uses cortex_kvm_host_capacity_usage with type="utilized", which undercounts hosts without current utilization; replace that metric with cortex_kvm_host_capacity_total while keeping the same label filters (availability_zone, workload_type, cpu_architecture, building_block, compute_host, enabled, maintenance, decommissioned, external_customer) and the same aggregation (count(max(...) by (compute_host))) and seriesNameFormat ("Number of Hosts") so the card counts hosts based on total capacity instead of utilization.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tools/perses/dashboards/cortex-compute-kvm-overview.json`:
- Around line 661-667: The compute_host PrometheusLabelValuesVariable is only
scoped by building_block; update its spec.matchers to mirror the upstream
filters (AZ, workload, architecture) so the host dropdown respects the same
slice. Locate the variable with labelName "compute_host" and replace the matcher
"cortex_kvm_host_capacity_total{building_block=~\"$building_block\"}" with a
matcher that includes the additional label filters, e.g.
"cortex_kvm_host_capacity_total{building_block=~\"$building_block\",
availability_zone=~\"$availability_zone\", workload=~\"$workload\",
architecture=~\"$architecture\"}" (use the exact upstream variable names used
elsewhere if they differ).
- Line 13: The dashboard JSON has a project slug mismatch: update the
metadata.project value (the "project" key in the JSON object) from
"workload-management" to the correct manifest name "workload_management" so the
dashboard will attach to the Perses project; locate the "project" entry
(currently "workload-management") and replace the hyphen with an underscore to
become "workload_management".
---
Nitpick comments:
In `@tools/perses/dashboards/cortex-compute-kvm-overview.json`:
- Around line 376-389: The host-count query in the TimeSeriesQuery
(PrometheusTimeSeriesQuery spec under "queries") uses
cortex_kvm_host_capacity_usage with type="utilized", which undercounts hosts
without current utilization; replace that metric with
cortex_kvm_host_capacity_total while keeping the same label filters
(availability_zone, workload_type, cpu_architecture, building_block,
compute_host, enabled, maintenance, decommissioned, external_customer) and the
same aggregation (count(max(...) by (compute_host))) and seriesNameFormat
("Number of Hosts") so the card counts hosts based on total capacity instead of
utilization.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 4fa6148b-50b2-4f2a-a456-cccc6596ca6a
📒 Files selected for processing (1)
tools/perses/dashboards/cortex-compute-kvm-overview.json
| "infrastructure", | ||
| "kvm" | ||
| ], | ||
| "project": "workload-management" |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
python - <<'PY'
import json
from pathlib import Path
dashboard = json.loads(Path("tools/perses/dashboards/cortex-compute-kvm-overview.json").read_text())
project = json.loads(Path("tools/perses/dashboards/project.json").read_text())
dashboard_project = dashboard["metadata"].get("project")
project_name = project["metadata"].get("name")
print(f"dashboard.metadata.project = {dashboard_project}")
print(f"project.metadata.name = {project_name}")
if dashboard_project != project_name:
raise SystemExit("Mismatch detected")
PYRepository: cobaltcore-dev/cortex
Length of output: 196
Fix project slug mismatch: use workload_management instead of workload-management.
The dashboard's metadata.project value (workload-management) does not match the project manifest name (workload_management). This underscore-dash drift will prevent the dashboard from attaching to the Perses project.
Change line 13 from:
"project": "workload-management"to:
"project": "workload_management"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tools/perses/dashboards/cortex-compute-kvm-overview.json` at line 13, The
dashboard JSON has a project slug mismatch: update the metadata.project value
(the "project" key in the JSON object) from "workload-management" to the correct
manifest name "workload_management" so the dashboard will attach to the Perses
project; locate the "project" entry (currently "workload-management") and
replace the hyphen with an underscore to become "workload_management".
| "plugin": { | ||
| "kind": "PrometheusLabelValuesVariable", | ||
| "spec": { | ||
| "labelName": "compute_host", | ||
| "matchers": [ | ||
| "cortex_kvm_host_capacity_total{building_block=~\"$building_block\"}" | ||
| ] |
There was a problem hiding this comment.
Scope the host picker with the upstream filters.
Right now compute_host only depends on building_block, so after narrowing by AZ/workload/architecture you can still select hosts outside that slice and end up with empty panels. Mirroring the already-defined filters here keeps the dropdown coherent.
♻️ Suggested matcher
- "cortex_kvm_host_capacity_total{building_block=~\"$building_block\"}"
+ "cortex_kvm_host_capacity_total{availability_zone=~\"$availability_zone\", workload_type=~\"$workload_type\", cpu_architecture=~\"$cpu_architecture\", building_block=~\"$building_block\"}"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tools/perses/dashboards/cortex-compute-kvm-overview.json` around lines 661 -
667, The compute_host PrometheusLabelValuesVariable is only scoped by
building_block; update its spec.matchers to mirror the upstream filters (AZ,
workload, architecture) so the host dropdown respects the same slice. Locate the
variable with labelName "compute_host" and replace the matcher
"cortex_kvm_host_capacity_total{building_block=~\"$building_block\"}" with a
matcher that includes the additional label filters, e.g.
"cortex_kvm_host_capacity_total{building_block=~\"$building_block\",
availability_zone=~\"$availability_zone\", workload=~\"$workload\",
architecture=~\"$architecture\"}" (use the exact upstream variable names used
elsewhere if they differ).
Since we want to replace plutono with perses in the future here is a PR to add the perses dependency.