CloudZero Agent - AI Development Guide

Critical Principle: Use the Documentation

This project is extensively documented. When you don't know how to do something, start with the documentation, then verify against source code. You have access to both - use them together. The better you maintain the documentation as you work, the more valuable it becomes for future development.

The CloudZero Agent is a complex, multi-component Kubernetes integration system with multiple Go applications, extensive testing infrastructure, and comprehensive documentation. This guide helps you navigate and maintain that documentation system.

Documentation Architecture

Hierarchical Organization

Documentation is organized in layers of increasing specificity as you descend the directory tree:

root/                           # Project-wide documentation
├── README.md                   # Overview, components, message formats
├── DEVELOPMENT.md              # Build system, testing, deployment
├── CONTRIBUTING.md             # Contribution standards
├── CLAUDE.md                   # This file - AI navigation guide
│
├── app/                        # Go application code
│   ├── README.md              # Application architecture (hexagonal pattern)
│   ├── CLAUDE.md              # AI-specific development guidance
│   ├── domain/                # Business logic layer
│   │   └── README.md          # Domain patterns and usage
│   ├── functions/             # CLI applications
│   │   ├── collector/
│   │   │   └── README.md      # Collector-specific docs
│   │   └── webhook/
│   │       └── README.md      # Webhook-specific docs
│   └── types/                 # Core interfaces
│       └── README.md          # Interface contracts
│
├── helm/                       # Kubernetes deployment
│   ├── README.md              # Chart structure
│   ├── templates/             # Extensively commented YAML
│   ├── docs/                  # Operational guides
│   │   ├── troubleshooting-guide.md
│   │   └── cert-trouble-shooting.md
│   └── tests/                 # Helm unit tests (standard location)
│
├── tests/                      # Most tests live here
│   ├── README.md              # Testing philosophy and methodology
│   ├── helm/                  # Helm integration tests
│   │   ├── README.md          # Helm testing strategies
│   │   └── templates/         # Local test overrides (gitignored)
│   ├── kuttl/                 # Kubernetes integration tests
│   │   └── README.md          # KUTTL framework usage
│   └── integration/           # API integration tests
│       └── README.md          # Integration test patterns
│
├── clusters/                   # Multi-cluster deployment configs
│   └── README.md              # Cluster configuration system
│
└── docs/                       # Additional documentation
    └── testing/               # Testing guides

Key Documentation Locations

Source Code Documentation:

Go files: Fully documented with go doc comments
Helm templates: Extensive YAML comments explaining logic
Makefile: Commented targets and functions

Markdown Documentation:

Root level: Project-wide concerns (README.md, DEVELOPMENT.md, CONTRIBUTING.md)
Component level: Architecture and usage (app/README.md, helm/README.md)
Subdirectory level: Specific implementations (app/domain/README.md, tests/kuttl/README.md)

Essential Development Practices

1. Test-Driven Development

MANDATORY: Every development task must follow a test-driven approach:

Core principles:

Write tests first - Define expected behavior before implementation
Add tests incrementally - Create tests for each new feature or bug fix
Test at all levels - Unit, integration, Helm, and end-to-end
Use tests for debugging - When issues arise, add tests to reproduce and verify fixes

Test coverage requirements:

Go code: Unit tests with gomock for isolation, table-driven tests for scenarios
Helm charts: Unit tests (helm-unittest), schema validation, template tests, KUTTL e2e tests
Integration: API tests, smoke tests, complete workflow tests
Quality over quantity: Target >90% coverage with meaningful tests

2. Always Use the Makefile

DO NOT run tools directly. The Makefile ensures consistent tool versions, proper flags, and dependency handling:

# ✅ CORRECT - Use Makefile targets
make -j format lint test build
make test GO_TEST_TARGET=./app/domain
make generate

# ❌ WRONG - Direct tool invocation
go test ./...
go generate
golangci-lint run

Why Make is required:

Ensures consistent tool versions and flags
Handles dependencies and generated files correctly
Provides parallelism with -j flag
Integrates with project's build system properly

Parallelization: Use -j flag for parallel execution - the Makefile is designed for this:

make -j test-all              # Run all tests in parallel
make -j format lint analyze   # Multiple targets in parallel

3. Leverage the Cluster Configuration System

The clusters/ directory provides a powerful multi-cluster deployment system:

# Deploy to specific cluster
CLUSTER_NAME=my-cluster make helm-install helm-wait

# Use KIND for local development (default)
make helm-install helm-wait

# Uninstall from cluster
CLUSTER_NAME=my-cluster make helm-uninstall

Add your own clusters: Create clusters/my-cluster.yaml and clusters/my-cluster-overrides.yaml (gitignored by default)

4. Use Local Override Files for Testing

The tests/helm/templates/ directory gitignores files matching local-*.yaml:

# Create local test configuration
cat > tests/helm/templates/local-test-overrides.yaml <<EOF
components:
  collector:
    replicas: 3
EOF

# Generate manifest with overrides
make tests/helm/templates/local-test.yaml

# Diff against default to see from default installation
diff -u tests/helm/templates/manifest.yaml tests/helm/templates/local-test-manifest.yaml

This is an extremely powerful tool for understanding how configuration affects output.

5. Research Before Implementing

When you don't know something:

Check the relevant README - Start at the appropriate level (component, subdirectory)
Review CLAUDE.md files - AI-specific guidance in relevant directories
Review README.md and other markdown files - There is extensive documentation in markdown files in the tree
Read the source code - Extensively commented with go doc comments and inline docs
Check DEVELOPMENT.md - Build system, testing, and workflow patterns
Examine Makefile targets - Many capabilities are built into the Makefile

Component READMEs often document:

How to run tests for that specific component
Useful Makefile variables (e.g., GO_TEST_TARGET, GO_TEST_FLAGS)
Component-specific testing tricks and configurations
Common debugging approaches

Example Research Path for Adding a New Feature:

1. app/README.md           → Understand hexagonal architecture
2. app/types/README.md     → Review interface contracts
3. Similar component code  → See established patterns
4. DEVELOPMENT.md          → Understand testing requirements
5. Implement following patterns

Documentation Maintenance Protocol

CRITICAL: When making changes, documentation must be updated synchronously to avoid publishing outdated information.

Change Impact Assessment

For every code change, check documentation at all relevant levels:

flowchart TD
    Change[Make Code Change] --> Check{Check Documentation}

    Check -->|Source Level| Source[Update godoc comments<br/>Update YAML template comments<br/>Update inline documentation]

    Check -->|Component Level| Component[Update README.md<br/>Update CLAUDE.md<br/>Update examples]

    Check -->|System Level| System[Update root README.md<br/>Update DEVELOPMENT.md<br/>Update guides in docs/]

    Source --> Validate[Verify Consistency]
    Component --> Validate
    System --> Validate

    Validate --> Test[Test Examples]
    Test --> Commit[Commit All Changes Together]

Documentation Update Checklist

Before committing changes:

Updated godoc comments if function behavior changed
Updated YAML template comments if Helm logic changed
Updated component README if architecture/usage changed
Updated examples if interfaces changed
Verified examples execute successfully
Checked root README for outdated information
Updated DEVELOPMENT.md if build/test workflows changed

Bubble-Up Review Process

When making changes, bubble up through the directory tree checking documentation at each level:

Start at implementation - Update godoc/inline comments
Check subdirectory - Update component README/CLAUDE.md
Check parent directory - Update broader documentation
Check root - Update project-wide documentation if needed

Project Structure Overview

The repository produces a Helm Chart (helm/) that is mirrored to github.com/Cloudzero/cloudzero-charts - this is the primary user-facing artifact. Everything else serves development, testing, and build infrastructure:

root/
├── app/                        # All Go application code
│   ├── domain/                # Business logic (hexagonal architecture core)
│   ├── functions/             # CLI applications (collector, shipper, webhook, validator)
│   ├── types/                 # Core interfaces and contracts
│   ├── handlers/              # HTTP and request handlers
│   ├── storage/               # Data persistence layer
│   └── utils/                 # Shared utilities
│
├── helm/                       # Kubernetes deployment (USER-FACING)
│   ├── templates/             # Extensively commented K8s manifests
│   ├── tests/                 # Helm unit tests (standard Helm location)
│   └── docs/                  # Operational guides and troubleshooting
│
├── tests/                      # Comprehensive testing infrastructure
│   ├── integration/           # API integration tests
│   ├── helm/                  # Helm integration tests
│   │   └── templates/         # Test configurations (gitignored local-*.yaml)
│   ├── kuttl/                 # Kubernetes end-to-end tests
│   ├── smoke/                 # Production validation
│   └── load/                  # Performance testing
│
├── clusters/                   # Multi-cluster deployment configurations
├── docker/                     # Container build definitions
├── docs/                       # Additional documentation
├── scripts/                    # Build and utility scripts
└── .tools/                     # Development tool dependencies

Note that unit tests co-located with Go source as *_test.go files.

Testing Philosophy

This project has extensive testing infrastructure covering multiple layers:

Unit tests: Co-located with Go source files
Integration tests: tests/integration/ - API integration
Helm tests: helm/tests/ (unit) and tests/helm/ (integration)
Kubernetes tests: tests/kuttl/ - End-to-end cluster testing
Smoke tests: tests/smoke/ - Production validation
Load tests: tests/load/ - Performance testing

To run all tests: make -j test-all. WARNING: this is very slow, like > 10 minutes. Because it does a lot.

See tests/README.md for comprehensive testing methodology.

Navigation by Task Type

Adding/Modifying Go Code

app/README.md - Architecture overview
app/types/README.md - Interface contracts
Component-specific README - Detailed patterns
Source code - See existing implementations
DEVELOPMENT.md - Testing requirements

Helm Chart Development

helm/README.md - Chart structure
helm/docs/ - Operational guides
tests/helm/README.md - Testing strategies
Helm template comments - Inline documentation
DEVELOPMENT.md - Build workflows

Debugging/Troubleshooting

helm/docs/troubleshooting-guide.md - Common issues
helm/docs/cert-trouble-shooting.md - Certificate issues
Component README files - Behavior documentation
DEVELOPMENT.md - Build and deployment

Testing

Check component README first - Many directories document how to test that component
tests/README.md - Testing philosophy
Test type-specific README - Detailed guidance (e.g., tests/kuttl/README.md)
DEVELOPMENT.md - Test execution commands
Component README - Component testing patterns and useful tricks

Key Reminders

This project is extremely well-documented - Research before guessing
Always use Makefile targets - Never run tools directly
Use -j for parallel execution - Everything is designed for this
Leverage cluster configurations - Easy multi-cluster deployment
Use local override files - Powerful testing tool in tests/helm/templates/
Maintain documentation consistency - Update all levels when making changes
Follow established patterns - Consistency is critical in complex systems
Trust the inline documentation - Godoc and YAML comments are comprehensive

Quick Reference

# Essential development cycle
make -j format lint test build

# Comprehensive testing (WARNING: slow, > 10 minutes)
make -j test-all

# Targeted testing (very useful - check component READMEs for more tricks)
GO_TEST_TARGET=./app/domain make test                   # Test specific package
GO_TEST_TARGET=./app/functions/collector make test      # Test collector only
GO_TEST_FLAGS="-v -run TestSpecificFunction" make test  # Specific test

# Specific test categories
make test              # Go unit tests
make test-integration  # API integration tests
make test-smoke        # Smoke tests
make helm-test         # All Helm validation tests
make helm-test-kuttl   # Kubernetes end-to-end tests

# Helm operations
make helm-lint                             # Lint Helm chart
make helm-install helm-wait                # Deploy and wait for ready
CLUSTER_NAME=my-cluster make helm-install  # Deploy to custom cluster
make helm-uninstall                        # Remove from cluster

# Full workflow (creates cluster, tests, cleans up)
make kind-test  # Complete KIND testing workflow

# Cleanup
make clean      # Remove build artifacts
make kind-down  # Delete KIND cluster

# Tools and help
make install-tools  # Install development dependencies
make help           # List all available targets

Remember

You have access to extensive documentation and source code. Start with the documentation to understand the system, then verify your understanding against the source. The documentation exists at multiple levels - use the hierarchical structure to find what you need, and keep it up-to-date as you make changes. The more accurately you maintain it, the more valuable it becomes.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CloudZero Agent - AI Development Guide

Critical Principle: Use the Documentation

Documentation Architecture

Hierarchical Organization

Key Documentation Locations

Essential Development Practices

1. Test-Driven Development

2. Always Use the Makefile

3. Leverage the Cluster Configuration System

4. Use Local Override Files for Testing

5. Research Before Implementing

Documentation Maintenance Protocol

Change Impact Assessment

Documentation Update Checklist

Bubble-Up Review Process

Project Structure Overview

Testing Philosophy

Navigation by Task Type

Adding/Modifying Go Code

Helm Chart Development

Debugging/Troubleshooting

Testing

Key Reminders

Quick Reference

Remember

FilesExpand file tree

CLAUDE.md

Latest commit

History

CLAUDE.md

File metadata and controls

CloudZero Agent - AI Development Guide

Critical Principle: Use the Documentation

Documentation Architecture

Hierarchical Organization

Key Documentation Locations

Essential Development Practices

1. Test-Driven Development

2. Always Use the Makefile

3. Leverage the Cluster Configuration System

4. Use Local Override Files for Testing

5. Research Before Implementing

Documentation Maintenance Protocol

Change Impact Assessment

Documentation Update Checklist

Bubble-Up Review Process

Project Structure Overview

Testing Philosophy

Navigation by Task Type

Adding/Modifying Go Code

Helm Chart Development

Debugging/Troubleshooting

Testing

Key Reminders

Quick Reference

Remember