Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Flux ResourceSet — API-Driven GitOps

flux-resourceset is a repo containing an example API service that powers an API-driven, GitOps-based model for managing Kubernetes clusters at enterprise scale. Instead of a central management cluster pushing configuration to child clusters, each child cluster pulls its own desired state from this API — and Flux reconciles the difference.

A GitOps-based model. The ResourceSet templates that define how resources are rendered live in Git and follow standard GitOps review workflows. The API adds a dynamic data layer on top — what each cluster should run is served by the API, while how it is deployed is governed by version-controlled templates. The combination preserves GitOps principles (declarative, versioned, continuously reconciled) while adding the operational flexibility that enterprise multi-cluster management demands.

The Problem

Traditional enterprise Kubernetes platforms suffer from:

  • Slow provisioning — cluster creation taking weeks, not minutes
  • State divergence — configuration management tools (Ansible, Terraform, Puppet, Salt, or custom automation scripts), CMDB databases, and actual cluster state drifting apart over time
  • Manual release ceremonies — PRs, approvals, and tier-by-tier rollouts for every platform component change
  • Scaling bottlenecks — centralized push-based management that breaks down at hundreds of clusters
  • Infrastructure lock-in — tooling that assumes a specific cloud provider or VM provisioner, making hybrid and multi-cloud deployments painful

The Solution

This project implements a resource-driven, pull-based architecture where:

  1. A central API (this service) is the single source of truth for cluster configuration
  2. Each cluster’s Flux Operator phones home to fetch its desired state
  3. ResourceSet templates render Kubernetes resources from the API response
  4. Flux continuously reconciles — any API change is automatically applied

This model is infrastructure-agnostic. It works on bare-metal on-premises data centers, private cloud, public cloud (AWS EKS, Azure AKS, GCP GKE), edge locations, or any hybrid combination. The only requirement is that each cluster can make outbound HTTPS requests to the API endpoint.

graph TB
    API["flux-resourceset API<br/>(single source of truth)"]

    subgraph "Child Cluster 1"
        P1["ResourceSetInputProvider<br/>(polls every 5m)"]
        RS1["ResourceSet<br/>(renders templates)"]
        K1["Flux Kustomize/Helm<br/>(reconciles)"]
        P1 -->|"fetches inputs"| RS1
        RS1 -->|"creates resources"| K1
    end

    subgraph "Child Cluster 2"
        P2["ResourceSetInputProvider"]
        RS2["ResourceSet"]
        K2["Flux Kustomize/Helm"]
        P2 --> RS2 --> K2
    end

    subgraph "Child Cluster N"
        PN["ResourceSetInputProvider"]
        RSN["ResourceSet"]
        KN["Flux Kustomize/Helm"]
        PN --> RSN --> KN
    end

    P1 -->|"GET /clusters/{dns}/platform-components"| API
    P2 -->|"GET /clusters/{dns}/namespaces"| API
    PN -->|"GET /clusters/{dns}/rolebindings"| API

Key Upstream Projects

This architecture builds on two open-source projects:

  • Flux Operator — provides the ResourceSet and ResourceSetInputProvider CRDs that power the templating and phone-home polling. The ExternalService input type is the foundation this architecture is built on. (GitHub)
  • Firestone — a resource-based API specification generator that converts JSON Schema definitions into OpenAPI specs, CLI tools, and downstream code. Firestone defines the resource schemas (cluster, platform_component, namespace, rolebinding) that drive code generation for this project.

What This Service Does

flux-resourceset reads cluster configuration data, merges per-cluster overrides with catalog defaults, and returns responses in the {"inputs": [...]} format that the Flux Operator’s ResourceSetInputProvider (ExternalService type) requires.

Each resource type gets its own endpoint:

EndpointWhat It Returns
GET /api/v2/flux/clusters/{dns}/platform-componentsHelmRelease + HelmRepository + ConfigMap inputs per component
GET /api/v2/flux/clusters/{dns}/namespacesNamespace inputs with labels and annotations
GET /api/v2/flux/clusters/{dns}/rolebindingsClusterRoleBinding inputs with subjects
GET /api/v2/flux/clustersCluster list for management plane provisioning

Key Concepts

ConceptDescription
Phone-home modelClusters pull config; the API never pushes. Scales to thousands of clusters.
Resource-driven developmentDefine resources (clusters, components, namespaces) as structured data. Templates turn data into Kubernetes manifests.
Dynamic patchingPer-cluster, per-component value overrides without touching Git. Change a replica count in the API and watch Flux reconcile.
Catalog + overridesPlatform components live in a catalog with defaults. Each cluster can override oci_tag, component_path, or inject custom patches.
ExternalService contractAll responses follow {"inputs": [{"id": "...", ...}]} — the format Flux Operator requires.
Infrastructure-agnosticWorks on-prem, in the cloud, at the edge, or across hybrid environments. No vendor lock-in.

Quick Start

cd flux-resourceset
make demo          # Creates kind cluster, installs Flux, deploys API + demo data
make cli-demo      # Runs the CLI demo flow end-to-end

See the Local Demo chapter for full details.

System Overview

The architecture separates concerns into three layers: the data plane (where cluster config lives), the API plane (this service), and the cluster plane (Flux running on each child cluster).

High-Level Architecture

graph TB
    subgraph "Data Layer"
        DB[("Data Store<br/>(SQLite / In-Memory)")]
    end

    subgraph "API Layer"
        READ["flux-resourceset<br/>(read-only mode)"]
        CRUD["flux-resourceset<br/>(CRUD mode)"]
        CLI["flux-resourceset-cli"]
    end

    subgraph "Cluster Layer"
        subgraph "Child Cluster"
            RSIP["ResourceSetInputProvider<br/>type: ExternalService"]
            RS["ResourceSet<br/>(templates)"]
            HR["HelmRelease / Kustomization"]
            NS["Namespace"]
            RB["ClusterRoleBinding"]
        end
    end

    DB -->|"read"| READ
    DB <-->|"read/write"| CRUD
    CLI -->|"CRUD operations"| CRUD
    RSIP -->|"polls"| READ
    RSIP -->|"inputs"| RS
    RS -->|"renders"| HR
    RS -->|"renders"| NS
    RS -->|"renders"| RB

Component Roles

Data Store

By default, this is SQLite (configured via DATABASE_URL). For lightweight/dev workflows it can run in-memory (STORE_BACKEND=memory) using data/seed.json as initial state.

The store holds four logical resource sets:

  • clusters — each cluster’s full configuration: assigned components, namespaces, rolebindings, and per-component patches
  • platform_components — component catalog entries with defaults, OCI URLs/tags, and dependencies
  • namespaces — reusable namespace definitions referenced by clusters
  • rolebindings — reusable RBAC rolebinding definitions referenced by clusters

API Service (flux-resourceset)

A Rust service built with axum that operates in two modes:

ModePurposeEndpoints
read-onlyFlux polling — high concurrency, minimal resource usage/api/v2/flux/..., /health, /ready
crudOperator/CLI access — full CRUD for managing cluster stateAll read endpoints + /clusters, /platform_components, /namespaces, /rolebindings

The read-only mode is designed to run as a multi-replica deployment serving cluster polls. The CRUD mode is for operators and CI/CD pipelines that need to modify cluster configuration.

CLI (flux-resourceset-cli)

A command-line tool for interacting with the CRUD API. Supports listing, creating, and patching resources. Used for demos and operational workflows.

Flux Operator (on each cluster)

Each cluster runs:

  1. ResourceSetInputProvider — calls the API on a schedule, fetches {"inputs": [...]}
  2. ResourceSet — takes the inputs and renders Kubernetes manifests from templates
  3. Flux controllers — reconcile the rendered manifests (HelmRelease, Kustomization, Namespace, etc.)

Data Flow

sequenceDiagram
    participant Operator as Operator / CLI
    participant API as flux-resourceset (CRUD)
    participant DB as Data Store
    participant ReadAPI as flux-resourceset (read-only)
    participant Cluster as Child Cluster (Flux)

    Operator->>API: PATCH /clusters/demo-cluster-01<br/>{"patches": {"podinfo": {"replicaCount": "3"}}}
    API->>DB: Update cluster document
    API-->>Operator: 200 OK

    Note over Cluster: Every 5 minutes (or on-demand)

    Cluster->>ReadAPI: GET /api/v2/flux/clusters/{dns}/platform-components
    ReadAPI->>DB: Fetch cluster + catalog docs
    DB-->>ReadAPI: Cluster doc + component catalog
    ReadAPI->>ReadAPI: Merge overrides with catalog defaults
    ReadAPI-->>Cluster: {"inputs": [{...component with patches...}]}

    Cluster->>Cluster: ResourceSet renders HelmRelease with patched values
    Cluster->>Cluster: Flux reconciles — podinfo scales to 3 replicas

Why This Architecture

vs. Push-Based (ArgoCD ApplicationSets, central Flux)

ConcernPush-basedPhone-home (this)
ScalabilityManagement cluster must maintain connections to all childrenEach cluster independently polls; API is stateless
Failure blast radiusManagement cluster outage = all clusters lose reconciliationAPI outage = clusters keep running last-known state
Network requirementsManagement cluster needs outbound access to all clustersClusters need outbound access to one API endpoint
Credential managementManagement cluster holds kubeconfigs for all clustersEach cluster holds one bearer token

vs. Git-per-Cluster

ConcernGit-per-clusterAPI-driven (this)
Updating 500 clusters500 PRs or complex monorepo toolingOne API call to update the component catalog
Per-cluster overridesBranch strategies or overlay directoriesFirst-class patches object per cluster
Audit trailGit historyAPI audit log + Git history for templates
Dynamic responseStatic YAML filesMerge logic computes cluster-specific state

vs. Direct Kubernetes API Access

A common question is: why not have operators kubectl apply directly, or build tooling that talks to the Kubernetes API on each cluster? See the FAQ for a detailed answer. The short version: a purpose-built API gives you a single control point with business logic, validation, audit logging, and integration hooks — things the raw Kubernetes API does not provide at fleet scale.

Infrastructure Agnostic

This architecture has no dependency on a specific cloud provider, VM provisioner, or Kubernetes distribution. The phone-home pattern requires only one thing: outbound HTTPS from each cluster to the API.

graph TB
    API["flux-resourceset API"]

    subgraph "On-Premises Data Center"
        OP1["Bare-metal cluster"]
        OP2["VMware vSphere cluster"]
    end

    subgraph "Public Cloud"
        AWS["AWS EKS"]
        AZ["Azure AKS"]
        GCP["GCP GKE"]
    end

    subgraph "Edge"
        E1["Edge location 1"]
        E2["Edge location 2"]
    end

    OP1 & OP2 -->|"HTTPS"| API
    AWS & AZ & GCP -->|"HTTPS"| API
    E1 & E2 -->|"HTTPS"| API
EnvironmentHow It Works
On-prem bare metalClusters provisioned via PXE boot, cloud-init, or immutable OS images. Flux bootstrap manifests pre-installed or applied post-boot.
On-prem VMsVMware, KVM, Hyper-V, or any hypervisor. Same bootstrap pattern — inject identity, let Flux phone home.
Public cloud managed K8sEKS, AKS, GKE — deploy Flux Operator as an add-on or Helm chart. Providers and ResourceSets applied via GitOps or cluster bootstrap.
Edge / remote sitesLightweight clusters (k3s, k0s, MicroK8s) at edge locations. Phone home over VPN or direct HTTPS.
HybridMix any of the above. Each cluster phones home to the same API regardless of where it runs.

The cluster provisioning mechanism is completely decoupled from the platform component management. Whether you use Terraform, Crossplane, Cluster API, custom scripts, or manual provisioning — once Flux is running and the cluster-identity ConfigMap exists, the phone-home loop takes over.

Phone-Home Model

The phone-home model is the core architectural pattern. Every child cluster is self-managing — it phones home to the API to discover its desired state, then reconciles locally. The provisioning layer’s only job is creating the cluster infrastructure and injecting a bootstrap identity. After that, the child cluster is autonomous.

How It Works

sequenceDiagram
    participant Mgmt as Management Cluster
    participant VM as Child Cluster VMs
    participant Flux as Flux Operator
    participant API as flux-resourceset API

    Mgmt->>VM: Provision cluster infrastructure<br/>Inject cluster-identity ConfigMap
    VM->>Flux: Cluster boots → Flux Operator starts
    Flux->>Flux: Reads cluster-identity ConfigMap<br/>(CLUSTER_NAME, CLUSTER_DNS, ENVIRONMENT)

    loop Every reconcile interval
        Flux->>API: GET /clusters/{CLUSTER_DNS}/platform-components
        API-->>Flux: {"inputs": [...components...]}
        Flux->>Flux: ResourceSet renders HelmRelease per component
        Flux->>Flux: Flux reconciles rendered resources

        Flux->>API: GET /clusters/{CLUSTER_DNS}/namespaces
        API-->>Flux: {"inputs": [...namespaces...]}
        Flux->>Flux: ResourceSet renders Namespace resources

        Flux->>API: GET /clusters/{CLUSTER_DNS}/rolebindings
        API-->>Flux: {"inputs": [...bindings...]}
        Flux->>Flux: ResourceSet renders ClusterRoleBinding resources
    end

    Note over Mgmt: Management cluster is out of the loop<br/>for all platform component management

Bootstrap Flow

The bootstrap sequence is designed so that every cluster starts identically and differentiates itself only through the API response:

  1. Cluster provisioning — The infrastructure layer creates the cluster. This could be VMs from immutable OS images, cloud-managed Kubernetes (EKS, AKS, GKE), bare-metal nodes via PXE boot, or any other provisioning method. The Flux Operator bootstrap manifests are pre-installed in the image or applied post-boot.

  2. Identity injection — A cluster-identity ConfigMap is the only cluster-specific data injected during provisioning:

    apiVersion: v1
    kind: ConfigMap
    metadata:
      name: cluster-identity
      namespace: flux-system
    data:
      CLUSTER_NAME: "us-east-prod-01"
      CLUSTER_DNS: "us-east-prod-01.k8s.internal.example.com"
      ENVIRONMENT: "prod"
      INTERNAL_API_URL: "https://internal-api.internal.example.com"
    
  3. Flux bootstrap — The cluster boots. Pre-installed or applied manifests start the Flux Operator and deploy the ResourceSetInputProviders + ResourceSets.

  4. Phone home — Each ResourceSetInputProvider calls the API using the cluster’s DNS name from the identity ConfigMap. The API returns that cluster’s specific configuration.

  5. Self-reconciliation — Flux renders and reconciles. From this point forward, the cluster is self-managing.

What Happens When the API Is Unreachable

The phone-home model degrades gracefully:

ScenarioCluster Behavior
API down for minutesResourceSetInputProvider goes not-ready. Existing Flux resources continue reconciling from cached state. No disruption.
API down for hoursSame — clusters keep running. They just cannot pick up new configuration changes.
API returns changed dataOn next successful poll, ResourceSet re-renders. Flux applies the diff.
API returns empty inputsFlux garbage-collects all resources the ResourceSet previously created. This is the decommission path.

Separation of Concerns

graph LR
    subgraph "Provisioning Layer"
        A["Cluster Provisioning<br/>(Terraform, Cluster API,<br/>cloud CLI, PXE, etc.)"]
        B["DNS / Networking<br/>Setup"]
        C["Identity Injection<br/>(cluster-identity ConfigMap)"]
    end

    subgraph "API Layer"
        D["Single Source of Truth<br/>for all cluster configuration"]
    end

    subgraph "Child Cluster"
        E["Platform component<br/>deployment & reconciliation"]
        F["Namespace & RBAC<br/>management"]
    end

    A --> C
    B --> C
    D -.->|"polled by"| E
    D -.->|"polled by"| F

The provisioning layer never deploys platform components to child clusters. It creates infrastructure and injects identity. The child cluster owns its own desired state by polling the API. This separation means the provisioning tooling (whether Terraform, Cluster API, Crossplane, custom scripts, or a management cluster) has no ongoing role in platform component management.

Per-Resource-Type Providers

Each resource type gets its own ResourceSetInputProvider + ResourceSet pair. This separation ensures:

  • Independent reconciliation — a namespace change does not trigger platform component re-rendering
  • Independent failure — if one provider fails, others continue working
  • Clear templates — each ResourceSet template is focused on one resource type
Resource TypeProvider NameEndpoint
Platform componentsplatform-components/api/v2/flux/clusters/{dns}/platform-components
Namespacesnamespaces/api/v2/flux/clusters/{dns}/namespaces
Role bindingsrolebindings/api/v2/flux/clusters/{dns}/rolebindings

All providers are pre-installed in every cluster’s bootstrap manifests. The cluster does not need to know what resource types exist — it polls all of them from boot.

Resource-Driven Development

Resource-driven development is the design philosophy behind this architecture. Instead of writing imperative scripts or maintaining per-cluster YAML, you define resources as structured data and let templates + reconciliation handle the rest.

The Idea

Every entity in the platform is a resource with a schema:

erDiagram
    CLUSTER ||--o{ COMPONENT_REF : "has platform_components"
    CLUSTER ||--o{ NAMESPACE_REF : "has namespaces"
    CLUSTER ||--o{ ROLEBINDING_REF : "has rolebindings"
    CLUSTER ||--o{ PATCH : "has patches"
    COMPONENT_REF }o--|| CATALOG_ENTRY : "references"
    NAMESPACE_REF }o--|| NAMESPACE_DEF : "references"
    ROLEBINDING_REF }o--|| ROLEBINDING_DEF : "references"

    CLUSTER {
        string id PK
        string cluster_name
        string cluster_dns
        string environment
    }
    COMPONENT_REF {
        string id FK
        boolean enabled
        string oci_tag "nullable override"
        string component_path "nullable override"
    }
    CATALOG_ENTRY {
        string id PK
        string component_path
        string component_version
        string oci_url
        string oci_tag
        boolean cluster_env_enabled
        string[] depends_on
    }
    NAMESPACE_REF {
        string id
    }
    ROLEBINDING_REF {
        string id
    }
    NAMESPACE_DEF {
        string id PK
        object labels
        object annotations
    }
    ROLEBINDING_DEF {
        string id PK
        string role
        object[] subjects
    }
    PATCH {
        string component_id FK
        object key_values
    }

cluster.namespaces and cluster.rolebindings are reference arrays (id only). Full namespace/rolebinding payloads live in their own definition resources and are resolved during merge.

Resources are declared, not scripted. The API merges them. Templates render them. Flux reconciles them.

Three-Layer Separation

The architecture cleanly separates what from how from where:

LayerResponsibilityWho Owns ItExample
DataWhat should exist on each clusterPlatform operators via API/CLI“Cluster X should have cert-manager v1.14.0 with 3 replicas”
TemplatesHow resources are rendered into Kubernetes manifestsPlatform engineers via GitResourceSet template that turns an input into a HelmRelease
ReconciliationWhere and when resources are appliedFlux Operator (automated)Flux detects drift and applies the diff

This separation means:

  • Operators change cluster state by updating data (API calls), not by writing YAML
  • Engineers change how things are deployed by updating templates (Git PRs), not by touching every cluster
  • Flux handles the convergence loop — no manual kubectl apply, no configuration management playbooks, no custom deployment scripts

How a Change Flows Through the System

Example: Adding a new platform component to 50 clusters

Traditional approach:

  1. Write Helm values for 50 clusters (or complex overlay structure)
  2. Open PR to add component to each cluster’s directory
  3. Wait for PR review and merge
  4. Watch tier-by-tier rollout
  5. Debug failures per-cluster

Resource-driven approach:

  1. Add the component to the catalog (one API call)
  2. Add a component reference to each cluster’s platform_components array (one API call per cluster, or a batch script)
  3. Done — Flux picks it up on next poll
flowchart LR
    A["API call:<br/>Add component<br/>to catalog"] --> B["API call:<br/>Add component_ref<br/>to cluster doc"]
    B --> C["Next poll cycle:<br/>Provider fetches<br/>updated inputs"]
    C --> D["ResourceSet renders<br/>HelmRepo + HelmRelease"]
    D --> E["Flux reconciles:<br/>component installed"]

Example: Patching a component value on one cluster

flowchart LR
    A["CLI:<br/>patch-component podinfo<br/>--set replicaCount=3"] --> B["API updates<br/>cluster.patches.podinfo"]
    B --> C["Provider fetches<br/>updated inputs"]
    C --> D["ResourceSet renders<br/>HelmRelease with<br/>valuesFrom ConfigMap"]
    D --> E["Flux reconciles:<br/>podinfo scales to 3"]

No Git PR. No pipeline. The data change flows through the system automatically.

Resource Schemas as API Contracts

Each resource type has a defined schema managed via Firestone — a resource-based API specification generator that converts JSON Schema definitions into OpenAPI specs, CLI tools, and downstream code generation artifacts.

The schemas:

  • cluster (v2) — the full cluster document with arrays of component refs, namespace refs, rolebinding refs, and a patches object
  • platform_component (v1) — the catalog entry with OCI URLs, versions, dependencies
  • namespace (v1) — namespace with labels and annotations
  • rolebinding (v1) — role binding with subjects

These schemas are the single source of truth for:

  • OpenAPI spec generation (openapi/openapi.yaml) — used for API documentation and client generation
  • Rust model generation (src/models/, src/apis/) — the structs the API service uses
  • CLI code generation (src/generated/cli/) — the CLI commands for each resource type

When a schema changes, make generate regenerates all downstream artifacts. This ensures the API, CLI, and documentation stay in sync with the resource definitions. See the Firestone documentation for the full schema language and generator options.

Benefits for Enterprise

Auditability

Every state change goes through the API. The API can log who changed what, when. Combined with Git history for templates, you have a full audit trail.

Consistency

The merge logic guarantees that every cluster gets a consistent, computed response. No hand-edited YAML files that drift.

Velocity

Operators can change cluster state in seconds. No PR cycles for operational changes. Reserve Git PRs for template/structural changes.

Testability

Because resources are structured data, you can:

  • Validate schemas before applying
  • Unit test merge logic
  • Integration test API responses against the ExternalService contract
  • Dry-run template rendering

Separation of Permissions

  • Template changes (how things deploy) require Git PR review
  • Data changes (what is deployed where) require API auth tokens
  • Reconciliation is automated — no human in the loop

API Reference

All endpoints return JSON. Flux-facing endpoints return the {"inputs": [...]} structure required by the ResourceSetInputProvider ExternalService contract. CRUD endpoints follow standard REST conventions.

Authentication

All endpoints require a Bearer token in the Authorization header.

ModeRead TokenWrite Token
read-onlyAUTH_TOKEN env varN/A (no write endpoints)
crudAUTH_TOKEN env varCRUD_AUTH_TOKEN env var (falls back to AUTH_TOKEN)
curl -H "Authorization: Bearer $AUTH_TOKEN" http://localhost:8080/health

Flux Read Endpoints

These endpoints are consumed by Flux Operator’s ResourceSetInputProvider. They follow the ExternalService contract.

ExternalService Contract

Every response must satisfy:

  • Top-level inputs array
  • Each item has a unique string id
  • Response body under 900 KiB
  • All JSON value types (strings, numbers, booleans, arrays, objects) are preserved in templates

GET /api/v2/flux/clusters/{cluster_dns}/platform-components

Returns platform components assigned to a cluster, with catalog defaults merged and per-cluster overrides applied.

Path parameters:

ParameterTypeDescription
cluster_dnsstringThe cluster’s DNS name (e.g., demo-cluster-01.k8s.example.com)

Response:

{
  "inputs": [
    {
      "id": "cert-manager",
      "component_path": "cert-manager",
      "component_version": "latest",
      "cluster_env_enabled": false,
      "depends_on": [],
      "enabled": true,
      "patches": {},
      "cluster": {
        "name": "demo-cluster-01",
        "dns": "demo-cluster-01.k8s.example.com",
        "environment": "dev"
      },
      "source": {
        "oci_url": "https://charts.jetstack.io",
        "oci_tag": "latest"
      }
    }
  ]
}

Field reference:

FieldTypeDescription
idstringUnique component identifier, used as Flux resource name suffix
component_pathstringChart name or path within OCI artifact. Cluster override takes precedence over catalog default
component_versionstringUpstream version. "latest" means no version pinning
cluster_env_enabledbooleanIf true, ResourceSet template appends /{environment} to the path
depends_onstring[]Component IDs that must be healthy first. Empty = no dependencies
enabledbooleanfalse causes Flux to garbage-collect the component
patchesobjectPer-cluster key-value overrides, injected via HelmRelease valuesFrom
cluster.namestringCluster identifier
cluster.dnsstringCluster FQDN
cluster.environmentstringTier: dev, qa, uat, prod
source.oci_urlstringHelm repository or OCI registry URL
source.oci_tagstringChart/artifact version tag. Cluster override takes precedence

GET /api/v2/flux/clusters/{cluster_dns}/namespaces

Returns namespaces assigned to a cluster.

Response:

{
  "inputs": [
    {
      "id": "cert-manager",
      "labels": { "app": "cert-manager" },
      "annotations": {},
      "cluster": {
        "name": "demo-cluster-01",
        "dns": "demo-cluster-01.k8s.example.com",
        "environment": "dev"
      }
    }
  ]
}

GET /api/v2/flux/clusters/{cluster_dns}/rolebindings

Returns role bindings assigned to a cluster.

Response:

{
  "inputs": [
    {
      "id": "platform-admins",
      "role": "cluster-admin",
      "subjects": [
        {
          "kind": "Group",
          "name": "platform-team",
          "apiGroup": "rbac.authorization.k8s.io"
        }
      ],
      "cluster": {
        "name": "demo-cluster-01",
        "dns": "demo-cluster-01.k8s.example.com",
        "environment": "dev"
      }
    }
  ]
}

GET /api/v2/flux/clusters

Returns all clusters. Used by management cluster provisioners.

Response:

{
  "inputs": [
    {
      "id": "demo-cluster-01",
      "cluster_name": "demo-cluster-01",
      "cluster_dns": "demo-cluster-01.k8s.example.com",
      "environment": "dev"
    }
  ]
}

CRUD Endpoints

Available when API_MODE=crud. These follow standard REST patterns.

Clusters

MethodPathDescription
GET/clustersList all clusters
POST/clustersCreate a cluster
GET/clusters/{id}Get cluster by ID
PUT/clusters/{id}Update a cluster
DELETE/clusters/{id}Delete a cluster

Cluster payload notes:

  • platform_components[] entries are references with per-cluster override fields (id, enabled, optional oci_tag, optional component_path).
  • namespaces[] entries are reference objects (id only).
  • rolebindings[] entries are reference objects (id only).

Platform Components

MethodPathDescription
GET/platform_componentsList all catalog components
POST/platform_componentsCreate a catalog entry
GET/platform_components/{id}Get component by ID
PUT/platform_components/{id}Update a catalog entry
DELETE/platform_components/{id}Delete a catalog entry

Namespaces

MethodPathDescription
GET/namespacesList all namespace definitions
POST/namespacesCreate a namespace definition
GET/namespaces/{id}Get namespace by ID
PUT/namespaces/{id}Update a namespace definition
DELETE/namespaces/{id}Delete a namespace definition

Rolebindings

MethodPathDescription
GET/rolebindingsList all rolebinding definitions
POST/rolebindingsCreate a rolebinding definition
GET/rolebindings/{id}Get rolebinding by ID
PUT/rolebindings/{id}Update a rolebinding definition
DELETE/rolebindings/{id}Delete a rolebinding definition

Service Endpoints

MethodPathDescription
GET/healthLiveness probe — returns {"status": "ok"}
GET/readyReadiness probe endpoint — currently returns {"status": "ok"}
GET/openapi.yamlOpenAPI 3.0 specification document

Error Responses

StatusCondition
401 UnauthorizedMissing or invalid bearer token
404 Not FoundCluster DNS or resource ID not found
500 Internal Server ErrorData store connection error

Merge Logic

The merge logic is the critical path in the API. It takes raw cluster documents and catalog entries and produces the computed response that Flux consumes. Understanding the merge is key to understanding the entire system.

Platform Components Merge

This is the most complex merge. It combines three data sources into a single response:

flowchart TD
    A["Cluster Document"] --> D["Merge Logic"]
    B["Component Catalog"] --> D
    C["Cluster Patches"] --> D
    D --> E["Flux Response<br/>{inputs: [...]}"]

    A -.- A1["platform_components[]<br/>per-cluster overrides"]
    B -.- B1["Default oci_tag, component_path,<br/>oci_url, depends_on"]
    C -.- C1["patches[component_id]<br/>key-value overrides"]

Merge Rules

For each component in the cluster’s platform_components array:

FieldSourceRule
idCluster component refPassed through
enabledCluster component refPassed through
component_pathCluster override OR catalog defaultCluster override wins if non-null
component_versionCatalogAlways from catalog
cluster_env_enabledCatalogAlways from catalog (template handles path appending)
source.oci_urlCatalogAlways from catalog
source.oci_tagCluster override OR catalog defaultCluster override wins if non-null
depends_onCatalogAlways from catalog
patchesCluster patches[component_id]Empty {} if no patches for this component
cluster.nameCluster docFrom cluster’s cluster_name
cluster.dnsCluster docFrom cluster’s cluster_dns
cluster.environmentCluster docFrom cluster’s environment

Merge Example

Given this cluster document:

{
  "cluster_name": "us-east-prod-01",
  "cluster_dns": "us-east-prod-01.k8s.example.com",
  "environment": "prod",
  "platform_components": [
    { "id": "cert-manager", "enabled": true, "oci_tag": null, "component_path": null },
    { "id": "grafana", "enabled": true, "oci_tag": "v1.0.0-1", "component_path": "observability/grafana/17.1.0" }
  ],
  "patches": {
    "grafana": { "GRAFANA_REPLICAS": "3" }
  }
}

And this catalog:

[
  { "_id": "cert-manager", "component_path": "core/cert-manager/1.14.0", "oci_url": "oci://registry/repo", "oci_tag": "v1.0.0", "depends_on": [] },
  { "_id": "grafana", "component_path": "observability/grafana/17.0.0", "oci_url": "oci://registry/repo", "oci_tag": "v1.0.0", "depends_on": ["cert-manager"] }
]

The merge produces:

{
  "inputs": [
    {
      "id": "cert-manager",
      "component_path": "core/cert-manager/1.14.0",
      "source": { "oci_url": "oci://registry/repo", "oci_tag": "v1.0.0" },
      "patches": {},
      "cluster": { "name": "us-east-prod-01", "dns": "us-east-prod-01.k8s.example.com", "environment": "prod" }
    },
    {
      "id": "grafana",
      "component_path": "observability/grafana/17.1.0",
      "source": { "oci_url": "oci://registry/repo", "oci_tag": "v1.0.0-1" },
      "depends_on": ["cert-manager"],
      "patches": { "GRAFANA_REPLICAS": "3" },
      "cluster": { "name": "us-east-prod-01", "dns": "us-east-prod-01.k8s.example.com", "environment": "prod" }
    }
  ]
}

Notice:

  • cert-manager uses catalog defaults for everything (cluster overrides are null)
  • grafana uses cluster override for oci_tag (v1.0.0-1) and component_path (observability/grafana/17.1.0)
  • grafana gets the per-cluster patch (GRAFANA_REPLICAS: "3")

Namespaces Merge

Namespaces now use a reference + lookup model:

flowchart TD
    A["Cluster Document"] --> C["Merge Logic"]
    B["Namespace Definitions"] --> C
    C --> D["Flux Response"]

    A -.- A1["namespaces[]<br/>id references"]
    B -.- B1["id, labels, annotations"]
    D -.- D1["Each namespace gets<br/>cluster block nested in"]

Merge steps:

  1. Read cluster.namespaces[] as ID references.
  2. Resolve each ID from the namespace definitions store.
  3. Return resolved namespace payload + nested cluster block (name, dns, environment).
  4. Any missing referenced IDs are skipped in Flux response generation.

Rolebindings Merge

Rolebindings follow the same pattern as namespaces:

  1. Read cluster.rolebindings[] as ID references.
  2. Resolve each ID from the rolebinding definitions store.
  3. Return resolved rolebinding payload (id, role, subjects[]) + nested cluster block.
  4. Any missing referenced IDs are skipped in Flux response generation.

Why Merge Matters

The merge logic is what makes this system more than a simple proxy. It enables:

  1. Catalog defaults — define a component once, inherit everywhere
  2. Per-cluster overrides — pin a specific cluster to a hotfix version without affecting others
  3. Per-cluster patches — inject environment-specific values without touching the component definition
  4. Computed responses — the cluster gets exactly the state it needs, computed from multiple data sources

Without the merge, you would need to duplicate the full component definition per cluster — which is exactly the problem this architecture solves.

Configuration & Deployment

Environment Variables

VariableRequiredDefaultDescription
API_MODEnoread-onlyRuntime mode: read-only or crud
STORE_BACKENDnosqliteData backend: sqlite or memory
DATABASE_URLnosqlite://data/flux-resourceset.db?mode=rwcSQLite DSN when STORE_BACKEND=sqlite
AUTH_TOKENyesBearer token for read routes
CRUD_AUTH_TOKENnoAUTH_TOKENBearer token for write routes in CRUD mode
SEED_FILEnodata/seed.jsonSeed data file loaded at startup
OPENAPI_FILEnoopenapi/openapi.yamlOpenAPI document served at /openapi.yaml
LISTEN_ADDRno0.0.0.0:8080Bind address
RUST_LOGnounsetTracing filter directive

Runtime Modes

read-only

The default mode. Serves only Flux read endpoints (/api/v2/flux/...) and service endpoints (/health, /ready, /openapi.yaml). Designed for high-concurrency polling from many clusters.

API_MODE=read-only AUTH_TOKEN=my-token cargo run

crud

Full CRUD mode. Includes all read endpoints plus REST endpoints for clusters, platform_components, namespaces, and rolebindings. Used by operators and CI/CD pipelines.

API_MODE=crud AUTH_TOKEN=read-token CRUD_AUTH_TOKEN=write-token cargo run

Production Deployment

Kubernetes Deployment (read-only)

apiVersion: apps/v1
kind: Deployment
metadata:
  name: flux-api-read
  namespace: flux-system
spec:
  replicas: 2
  selector:
    matchLabels:
      app: flux-api-read
  template:
    metadata:
      labels:
        app: flux-api-read
    spec:
      containers:
        - name: flux-api
          image: flux-resourceset:latest
          ports:
            - containerPort: 8080
          env:
            - name: API_MODE
              value: "read-only"
            - name: STORE_BACKEND
              value: "sqlite"
            - name: DATABASE_URL
              value: "sqlite:///var/lib/flux-resourceset/flux-resourceset.db?mode=rwc"
            - name: SEED_FILE
              value: "/seed/seed.json"
            - name: AUTH_TOKEN
              valueFrom:
                secretKeyRef:
                  name: internal-api-token
                  key: token
            - name: RUST_LOG
              value: "info"
          resources:
            requests:
              cpu: 50m
              memory: 32Mi
            limits:
              cpu: 200m
              memory: 64Mi
          livenessProbe:
            httpGet:
              path: /health
              port: 8080
            initialDelaySeconds: 2
            periodSeconds: 10
          readinessProbe:
            httpGet:
              path: /health
              port: 8080
            initialDelaySeconds: 2
            periodSeconds: 5

Resource requests are deliberately small — Rust’s efficiency means this service uses minimal resources. Run 2+ replicas for high availability, not for throughput.

Performance Characteristics

Each request does a data store lookup and a merge. Expected latency is sub-millisecond for the in-memory backend and typically single-digit milliseconds for SQLite on local SSD.

ClustersPoll IntervalRequests/sec
505 min0.17
2005 min0.67
1,0005 min3.3
5,0005 min16.7

Even at 5,000 clusters with three resource types each, the load is ~50 req/sec — trivial for a Rust/axum service.

Build Commands

cargo build                    # Build API + CLI
cargo build --bin flux-resourceset-cli  # Build CLI only
cargo test                     # Run all tests
cargo clippy -- -D warnings    # Lint
cargo fmt                      # Format

Docker

make docker-build              # Build container image

Code Generation

The project uses Firestone for schema-driven code generation:

make generate

This regenerates:

  • openapi/openapi.yaml — OpenAPI 3.0 spec
  • src/models/ — Rust model structs
  • src/apis/ — Rust API client modules
  • src/generated/cli/ — CLI command modules

ResourceSet Templates

ResourceSet templates are the bridge between API data and Kubernetes resources. They use the Flux Operator’s templating engine to render manifests from the {"inputs": [...]} response.

Upstream reference: See the full ResourceSet CRD documentation for all available spec fields, status conditions, and advanced features like inventory tracking and garbage collection.

Template Syntax

ResourceSet uses << and >> as delimiters (not {{/}}). This avoids conflicts with Helm templates and Go templates in the rendered YAML.

Key template functions:

  • << inputs.field >> — access input fields
  • << inputs.nested.field >> — access nested objects
  • << inputs.field | slugify >> — slugify a string for use in Kubernetes names
  • <<- range $k, $v := inputs.object >> — iterate over object keys
  • <<- range $item := inputs.array >> — iterate over arrays
  • <<- if inputs.field >> — conditional rendering
  • <<- if ne inputs.field "value" >> — conditional with comparison
  • << inputs.object | toYaml | nindent N >> — convert to YAML with indentation

Platform Components Template

This is the most complex template. For each component input, it renders up to three resources:

flowchart TD
    I["Input from API<br/>(one per component)"] --> CM{"Has patches?"}
    CM -->|yes| ConfigMap["ConfigMap<br/>values-{id}-{cluster}"]
    CM -->|no| Skip["Skip ConfigMap"]
    I --> HR["HelmRepository<br/>charts-{id}"]
    I --> HRL["HelmRelease<br/>platform-{id}"]
    ConfigMap -.->|"valuesFrom"| HRL
    HR -.->|"sourceRef"| HRL

Full Template

apiVersion: fluxcd.controlplane.io/v1
kind: ResourceSet
metadata:
  name: platform-components
  namespace: flux-system
spec:
  inputsFrom:
    - name: platform-components
  resourcesTemplate: |
    <<- if inputs.enabled >>
    <<- if inputs.patches >>
    ---
    apiVersion: v1
    kind: ConfigMap
    metadata:
      name: values-<< inputs.id | slugify >>-<< inputs.cluster.name | slugify >>
      namespace: flux-system
    data:
      <<- range $key, $value := inputs.patches >>
      << $key >>: "<< $value >>"
      <<- end >>
    <<- end >>
    ---
    apiVersion: source.toolkit.fluxcd.io/v1
    kind: HelmRepository
    metadata:
      name: charts-<< inputs.id | slugify >>
      namespace: flux-system
    spec:
      interval: 30m
      url: "<< inputs.source.oci_url >>"
    ---
    apiVersion: helm.toolkit.fluxcd.io/v2
    kind: HelmRelease
    metadata:
      name: platform-<< inputs.id >>
      namespace: flux-system
    spec:
      interval: 10m
      releaseName: << inputs.id | slugify >>
      targetNamespace: << inputs.id | slugify >>
      install:
        remediation:
          retries: 3
      upgrade:
        remediation:
          retries: 3
      chart:
        spec:
          chart: << inputs.component_path >>
          sourceRef:
            kind: HelmRepository
            name: charts-<< inputs.id | slugify >>
            namespace: flux-system
          interval: 10m
          <<- if ne inputs.component_version "latest" >>
          version: "<< inputs.component_version >>"
          <<- end >>
      <<- if inputs.depends_on >>
      dependsOn:
        <<- range $dep := inputs.depends_on >>
        - name: platform-<< $dep >>
        <<- end >>
        <<- end >>
      <<- if inputs.patches >>
      valuesFrom:
        <<- range $key, $_ := inputs.patches >>
        - kind: ConfigMap
          name: values-<< inputs.id | slugify >>-<< inputs.cluster.name | slugify >>
          valuesKey: << $key >>
          targetPath: << $key >>
        <<- end >>
      <<- end >>
    <<- end >>

What Each Section Does

Enabled check (<<- if inputs.enabled >>) — If the component is disabled, nothing is rendered. Flux garbage-collects previously rendered resources.

ConfigMap for patches — If the component has patches, a ConfigMap is created with the key-value pairs. The HelmRelease references this ConfigMap via valuesFrom, which maps each key to a Helm value path using targetPath.

HelmRepository — Points to the chart repository URL from inputs.source.oci_url.

HelmRelease — The core resource. Key behaviors:

  • chart references the HelmRepository and uses inputs.component_path as the chart name
  • version is only set if component_version is not "latest"
  • dependsOn creates ordering dependencies between components
  • valuesFrom injects per-cluster patches from the ConfigMap

Namespaces Template

Renders a Kubernetes Namespace for each input:

apiVersion: fluxcd.controlplane.io/v1
kind: ResourceSet
metadata:
  name: namespaces
  namespace: flux-system
spec:
  inputsFrom:
    - name: namespaces
  resourcesTemplate: |
    ---
    apiVersion: v1
    kind: Namespace
    metadata:
      name: << inputs.id >>
      labels:
        <<- range $k, $v := inputs.labels >>
        << $k >>: "<< $v >>"
        <<- end >>
      annotations:
        <<- range $k, $v := inputs.annotations >>
        << $k >>: "<< $v >>"
        <<- end >>

Labels and annotations from the API response are dynamically rendered using range.

Rolebindings Template

Renders a ClusterRoleBinding for each input:

apiVersion: fluxcd.controlplane.io/v1
kind: ResourceSet
metadata:
  name: rolebindings
  namespace: flux-system
spec:
  inputsFrom:
    - name: rolebindings
  resourcesTemplate: |
    ---
    apiVersion: rbac.authorization.k8s.io/v1
    kind: ClusterRoleBinding
    metadata:
      name: << inputs.id >>
    roleRef:
      apiGroup: rbac.authorization.k8s.io
      kind: ClusterRole
      name: << inputs.role >>
    subjects:
      <<- range $s := inputs.subjects >>
      - kind: << $s.kind >>
        name: << $s.name >>
        apiGroup: << $s.apiGroup >>
      <<- end >>

Template Design Principles

  1. One ResourceSet per resource type — keeps templates focused and failures isolated
  2. Conditional rendering — use if blocks to skip disabled components or optional fields
  3. Slugify names — Kubernetes resource names must be DNS-compatible; slugify handles this
  4. Garbage collection — when an input disappears from the API response, Flux removes the resources that ResourceSet previously created
  5. No cluster-specific logic in templates — all cluster differentiation comes from the API data, not from template conditionals

Further Reading

ResourceSetInputProvider

The ResourceSetInputProvider is the Flux Operator CRD that tells a ResourceSet where to fetch its input data. In this architecture, every provider uses type: ExternalService to call the flux-resourceset API.

Upstream reference: See the full ResourceSetInputProvider CRD documentation for all supported input types, authentication options, and status conditions.

How Providers Work

flowchart LR
    subgraph "flux-system namespace"
        P["ResourceSetInputProvider<br/>type: ExternalService"]
        S["Secret<br/>internal-api-token"]
        RS["ResourceSet"]
    end

    API["flux-resourceset API"]

    P -->|"GET (with bearer token)"| API
    S -.->|"secretRef"| P
    P -->|"provides inputs"| RS
    RS -->|"renders resources"| K8s["Kubernetes Resources"]

Provider Configuration

Each provider specifies:

  • typeExternalService (calls an HTTP API)
  • url — the endpoint to call
  • secretRef — Kubernetes Secret containing the bearer token
  • reconcileEvery — how often to poll (annotation)

Platform Components Provider

apiVersion: fluxcd.controlplane.io/v1
kind: ResourceSetInputProvider
metadata:
  name: platform-components
  namespace: flux-system
  annotations:
    fluxcd.controlplane.io/reconcileEvery: "30s"
spec:
  type: ExternalService
  url: http://flux-api-read.flux-system.svc.cluster.local:8080/api/v2/flux/clusters/demo-cluster-01.k8s.example.com/platform-components
  insecure: true
  secretRef:
    name: internal-api-token

Namespaces Provider

apiVersion: fluxcd.controlplane.io/v1
kind: ResourceSetInputProvider
metadata:
  name: namespaces
  namespace: flux-system
  annotations:
    fluxcd.controlplane.io/reconcileEvery: "30s"
spec:
  type: ExternalService
  url: http://flux-api-read.flux-system.svc.cluster.local:8080/api/v2/flux/clusters/demo-cluster-01.k8s.example.com/namespaces
  insecure: true
  secretRef:
    name: internal-api-token

Rolebindings Provider

Same pattern with /rolebindings endpoint.

URL Construction

In production, the provider URL uses variable substitution from the cluster-identity ConfigMap:

url: "${INTERNAL_API_URL}/api/v2/flux/clusters/${CLUSTER_DNS}/platform-components"

This means the same provider manifest works on every cluster — only the ConfigMap values differ.

In the demo, the URL is hardcoded to the in-cluster service address and a demo cluster DNS.

Authentication

The provider references a Secret that contains the bearer token:

apiVersion: v1
kind: Secret
metadata:
  name: internal-api-token
  namespace: flux-system
type: Opaque
stringData:
  token: "your-bearer-token-here"

The Flux Operator sends this as Authorization: Bearer <token> on every request.

For production, consider:

  • Token rotation — update the Secret, Flux picks up the new token on next request
  • mTLS — ResourceSetInputProvider supports certSecretRef for TLS client certificates

Reconciliation Behavior

EventProvider Behavior
Scheduled intervalProvider calls the API, ResourceSet re-renders if inputs changed
API returns same dataNo change — ResourceSet does not re-render
API returns new dataResourceSet re-renders, Flux applies the diff
API returns errorProvider goes not-ready, existing resources continue running
API unreachableSame as error — graceful degradation
Manual triggerAnnotate with fluxcd.controlplane.io/requestedAt to force immediate reconcile

Forcing Immediate Reconciliation

kubectl annotate resourcesetinputprovider platform-components -n flux-system \
  fluxcd.controlplane.io/requestedAt="$(date -u +"%Y-%m-%dT%H:%M:%SZ")" --overwrite

Observing Provider Status

# Check provider status
kubectl get resourcesetinputproviders -n flux-system

# Detailed status with conditions
kubectl describe resourcesetinputprovider platform-components -n flux-system

# Check ResourceSet status
kubectl get resourcesets -n flux-system

Further Reading

Dynamic Patching

Dynamic patching is one of the most powerful features of this architecture. It allows per-cluster, per-component value overrides without modifying Git, the component catalog, or any template. Operators can change Helm values, replica counts, feature flags, and more — and Flux reconciles the change automatically.

How Patching Works

sequenceDiagram
    participant Op as Operator
    participant API as flux-resourceset API
    participant DB as Data Store
    participant Flux as Child Cluster (Flux)

    Op->>API: PATCH cluster "us-east-prod-01"<br/>patches.grafana.replicaCount = "3"
    API->>DB: Update cluster document
    API-->>Op: 200 OK

    Note over Flux: Next poll cycle

    Flux->>API: GET /clusters/{dns}/platform-components
    API->>DB: Read cluster + catalog
    API->>API: Merge: inject patches.grafana into grafana input
    API-->>Flux: {"inputs": [{..., "patches": {"replicaCount": "3"}}]}

    Flux->>Flux: ResourceSet renders ConfigMap with replicaCount=3
    Flux->>Flux: HelmRelease references ConfigMap via valuesFrom
    Flux->>Flux: Helm upgrade applies new replica count

The Patches Object

Patches are stored in the cluster document, keyed by component ID:

{
  "cluster_dns": "us-east-prod-01.k8s.example.com",
  "patches": {
    "grafana": {
      "replicaCount": "3",
      "persistence.storageClassName": "ssd"
    },
    "podinfo": {
      "replicaCount": "2",
      "ui.color": "#2f855a",
      "ui.message": "Hello from patches"
    },
    "traefik": {
      "deployment.replicas": "1",
      "service.type": "ClusterIP"
    }
  }
}

Each key in a component’s patches maps to a Helm value path. Dotted keys (like ui.color) map to nested Helm values.

How Patches Become Helm Values

The ResourceSet template renders patches into a ConfigMap, then references it from the HelmRelease via valuesFrom:

flowchart TD
    A["API Response<br/>patches: {replicaCount: '2', ui.color: '#2f855a'}"]
    B["ConfigMap<br/>values-podinfo-cluster"]
    C["HelmRelease<br/>platform-podinfo"]
    D["Helm Chart<br/>podinfo"]

    A -->|"ResourceSet renders"| B
    B -->|"valuesFrom with targetPath"| C
    C -->|"helm upgrade"| D

    B -.- B1["data:<br/>  replicaCount: '2'<br/>  ui.color: '#2f855a'"]
    C -.- C1["valuesFrom:<br/>  - kind: ConfigMap<br/>    valuesKey: replicaCount<br/>    targetPath: replicaCount<br/>  - kind: ConfigMap<br/>    valuesKey: ui.color<br/>    targetPath: ui.color"]

The targetPath in valuesFrom tells Helm where to inject the value in the chart’s values tree. This is a standard Flux HelmRelease feature — the innovation is that the values are computed from the API, not hardcoded in Git.

In the demo template, each generated values ConfigMap is labeled reconcile.fluxcd.io/watch: "Enabled" and each generated HelmRelease uses interval: 1m. This gives fast event-driven upgrades when values change, plus a short periodic poll interval.

Patching via CLI

The demo includes a CLI command to patch any component with dynamic key=value paths:

# Patch podinfo values on demo-cluster-01
./target/debug/flux-resourceset-cli demo patch-component demo-cluster-01 podinfo \
  --set replicaCount=3 \
  --set ui.message="Hello from CLI patch" \
  --set ui.color="#3b82f6"

This updates the cluster document’s patches.podinfo object in the data store.

Patching Use Cases

Use CasePatch ExampleEffect
Scale a component{"replicaCount": "3"}Component scales to 3 replicas
Change UI branding{"ui.color": "#ff0000", "ui.message": "Maintenance"}Application UI reflects new values
Environment-specific tuning{"resources.limits.memory": "512Mi"}Different resource limits per cluster
Feature flags{"feature.newDashboard": "true"}Enable features per cluster
Ingress configuration{"ingress.className": "internal"}Different ingress class per cluster

Patching vs. Other Override Mechanisms

MechanismScopeRequires Git PR?Use Case
Catalog defaultsAll clusters using the componentYes (schema change)Global default values
OCI tag overrideOne cluster, one componentNo (API call)Hotfix or canary version
Component path overrideOne cluster, one componentNo (API call)Component version upgrade
PatchesOne cluster, one componentNo (API call)Value tuning, feature flags, scaling
Template changesAll clusters (template is global)Yes (Git PR)Changing how resources are rendered

Patches are the most granular override — they change individual Helm values without affecting any other cluster or component.

Verifying Patches

After patching, verify the change propagated:

# Reconcile quickly
flux reconcile helmrelease platform-podinfo -n flux-system --with-source

# Check reconcile result
kubectl get hr -n flux-system platform-podinfo \
  -o jsonpath='ready={.status.conditions[?(@.type=="Ready")].status} reason={.status.conditions[?(@.type=="Ready")].reason} action={.status.lastAttemptedReleaseAction}{"\n"}'

# Check the actual deployment
kubectl get deploy -n podinfo podinfo \
  -o jsonpath='replicas={.spec.replicas} color={.spec.template.spec.containers[0].env[?(@.name=="PODINFO_UI_COLOR")].value} message={.spec.template.spec.containers[0].env[?(@.name=="PODINFO_UI_MESSAGE")].value}{"\n"}'

# Check rendered values
kubectl get configmap -n flux-system values-podinfo-demo-cluster-01 \
  -o jsonpath='replicas={.data.replicaCount} color={.data.ui\.color} message={.data.ui\.message}{"\n"}'

Multi-Cluster Management

This architecture is designed from the ground up for managing hundreds to thousands of Kubernetes clusters. The phone-home model, stateless API, and resource-driven data model all contribute to linear scaling without operational complexity growth.

Scaling Properties

graph TB
    subgraph "Enterprise Fleet"
        direction TB
        DEV1["DEV Cluster 1"]
        DEV2["DEV Cluster 2"]
        DEV3["DEV Cluster ...N"]
        QA1["QA Cluster 1"]
        QA2["QA Cluster 2"]
        UAT1["UAT Cluster 1"]
        PROD1["PROD Cluster 1"]
        PROD2["PROD Cluster 2"]
        PROD3["PROD Cluster ...N"]
    end

    API["flux-resourceset API<br/>(stateless, multi-replica)"]

    DEV1 & DEV2 & DEV3 -->|"poll"| API
    QA1 & QA2 -->|"poll"| API
    UAT1 -->|"poll"| API
    PROD1 & PROD2 & PROD3 -->|"poll"| API

Why It Scales

PropertyHow
Stateless APINo per-cluster state in the API process. Add replicas for HA, not for capacity.
Pull-basedEach cluster owns its own reconciliation loop. The API does not need to track cluster connectivity.
Minimal request costEach request = 1 data store read + 1 merge. Sub-millisecond response time.
Independent failuresOne cluster’s provider failing does not affect any other cluster.
Linear polling load1,000 clusters polling 3 endpoints every 5 minutes = 10 req/sec. Trivial for any HTTP service.

Fleet-Wide Operations

Rolling Out a New Component

When a new platform component needs to be deployed across the fleet:

flowchart TD
    A["1. Add to component catalog<br/>(one API call)"] --> B["2. Add component_ref to<br/>target clusters<br/>(batch API calls)"]
    B --> C["3. Clusters poll on schedule"]
    C --> D["4. Each cluster independently<br/>installs the component"]

    B -->|"DEV first"| C1["DEV clusters pick up change"]
    B -->|"then QA"| C2["QA clusters pick up change"]
    B -->|"then PROD"| C3["PROD clusters pick up change"]

You control rollout speed by controlling when you add the component_ref to each tier’s clusters. No pipeline orchestration — just API calls.

Upgrading a Component Version

To upgrade grafana from 17.0.0 to 17.1.0 across the fleet:

  1. Ensure the new version exists in the platform components OCI artifact
  2. Update the catalog’s component_path from observability/grafana/17.0.0 to observability/grafana/17.1.0
  3. All clusters using catalog defaults pick up the change on next poll

For canary rollouts, override specific clusters first:

{
  "platform_components": [
    {
      "id": "grafana",
      "component_path": "observability/grafana/17.1.0",
      "oci_tag": "v1.1.0-rc1"
    }
  ]
}

DEV gets the new version. PROD stays on the catalog default.

Hotfix Workflow

flowchart LR
    A["CVE discovered<br/>in cert-manager"] --> B["Fix merged to repo<br/>OCI artifact v1.0.0-1 built"]
    B --> C["Update affected clusters<br/>oci_tag: v1.0.0-1"]
    C --> D["Clusters poll and<br/>reconcile the fix"]

    C -->|"Only cert-manager<br/>is affected"| E["All other components<br/>stay on v1.0.0"]

Hotfixes are per-component, per-cluster. You update oci_tag on the specific component for the specific clusters that need the fix. No full release cycle required.

Environment Tiers

The architecture has first-class support for environment-based differentiation:

MechanismHow It Works
cluster.environmentEach cluster document has an environment field (dev, qa, uat, prod). Included in every API response.
cluster_env_enabledWhen true on a catalog component, the ResourceSet template appends /{environment} to the component path. Different environment tiers get different Kustomize overlays.
Per-cluster patchesDifferent Helm values per cluster. PROD gets 5 replicas, DEV gets 1.
OCI tag overridesDEV clusters can pin to release candidates while PROD stays on stable.

Environment-Aware Path Resolution

When cluster_env_enabled is true:

Catalog component_path: core/cert-manager/1.14.0
Cluster environment: prod
→ Resolved path: core/cert-manager/1.14.0/prod

This enables the platform components repo to have environment-specific Kustomize overlays:

core/cert-manager/1.14.0/
├── base/
│   └── deployment.yaml
├── dev/
│   └── kustomization.yaml
├── qa/
│   └── kustomization.yaml
└── prod/
    └── kustomization.yaml

Decommissioning a Cluster

flowchart LR
    A["Delete cluster record<br/>from API"] --> B["All endpoints return<br/>empty inputs"]
    B --> C["ResourceSets render<br/>empty resource list"]
    C --> D["Flux garbage-collects<br/>all resources"]

No manual cleanup. No orphaned resources. The data model drives everything.

Enterprise Benefits Summary

BenefitDescription
Single source of truthOne API holds the desired state for every cluster. No separate configuration management inventory, no spreadsheets, no wiki pages.
Cluster creation in minutesBootstrap cluster + phone home + reconcile. No weeks-long process involving manual playbooks and ticket queues.
Zero state divergenceAPI data = ResourceSet input = running cluster state. Drift is automatically corrected.
Operational velocityChange a value via API → Flux reconciles. No PR, no review, no pipeline for operational changes.
Audit trailEvery API mutation is logged. Templates changes go through Git. Full traceability.
Team autonomyPlatform engineers own templates (Git). Platform operators own data (API). Flux owns reconciliation.
Failure isolationEach cluster is independent. API outage = no new changes, not cluster outage.
Cost efficiencyStateless API uses minimal resources. No management cluster scaling with fleet size.
Infrastructure-agnosticSame model works on-prem, in the cloud, at the edge, or across hybrid environments. No vendor lock-in.

Versioning & Hotfix Strategy

The platform components repo uses a versioning model with two independent axes of change: the OCI artifact tag and the component path within that artifact. This enables fine-grained control over what each cluster runs.

Two Axes of Version Control

graph TD
    subgraph "OCI Artifact (tagged build of the repo)"
        subgraph "v1.0.0"
            A1["core/cert-manager/1.14.0/"]
            A2["observability/grafana/17.0.0/"]
            A3["networking/ingress-nginx/4.9.0/"]
        end
    end

    subgraph "OCI Artifact (hotfix build)"
        subgraph "v1.0.0-1"
            B1["core/cert-manager/1.14.1/ ← fixed"]
            B2["observability/grafana/17.0.0/"]
            B3["networking/ingress-nginx/4.9.0/"]
        end
    end
AxisWhat It ControlsHow It Changes
OCI tagWhich build of the monorepo artifact to pullNew tag on each merge to main (v1.0.0, v1.0.0-1, v1.1.0)
Component pathWhich version directory within the artifact to useUpdate component_path in the API (observability/grafana/17.0.017.1.0)

Normal Release Flow

flowchart LR
    A["All components at<br/>v1.0.0"] --> B["New feature merged<br/>to platform repo"]
    B --> C["CI builds and tags<br/>v1.1.0"]
    C --> D["Update catalog<br/>oci_tag: v1.1.0"]
    D --> E["All clusters pull<br/>v1.1.0 on next poll"]

In a normal release, all components point to the same OCI tag. The catalog default is updated, and every cluster picks it up.

Hotfix Flow

flowchart LR
    A["CVE in cert-manager<br/>All clusters on v1.0.0"] --> B["Fix merged to<br/>cert-manager/1.14.1/"]
    B --> C["CI builds and tags<br/>v1.0.0-1"]
    C --> D["Update cert-manager's<br/>oci_tag to v1.0.0-1<br/>component_path to<br/>cert-manager/1.14.1"]
    D --> E["Only cert-manager<br/>updates on affected clusters"]
    E --> F["All other components<br/>stay on v1.0.0"]

Hotfixes use SemVer pre-release suffixes: v1.0.0-1, v1.0.0-2. This keeps them:

  • Sortablev1.0.0-1 < v1.0.0-2 < v1.1.0
  • Tied to base release — clear which release they patch
  • Temporary — the next full release collapses everything back to one tag

Per-Cluster Version Pinning

Any cluster can be pinned to a different version than the catalog default:

{
  "platform_components": [
    {
      "id": "grafana",
      "oci_tag": "v1.1.0-rc1",
      "component_path": "observability/grafana/17.1.0"
    }
  ]
}

Use cases:

  • Canary testing — DEV cluster gets the release candidate
  • Rollback — pin a PROD cluster to the previous version while investigating
  • Gradual rollout — update clusters one tier at a time

Component Lifecycle

stateDiagram-v2
    [*] --> CatalogEntry: Add to catalog
    CatalogEntry --> AssignedToCluster: Add component_ref to cluster
    AssignedToCluster --> Running: Flux reconciles
    Running --> Upgraded: Update component_path/oci_tag
    Upgraded --> Running: Flux reconciles new version
    Running --> Hotfixed: Per-cluster oci_tag override
    Hotfixed --> Running: Next full release
    Running --> Disabled: Set enabled=false
    Disabled --> Running: Set enabled=true
    Running --> Removed: Remove component_ref
    Removed --> GarbageCollected: Flux cleans up
    GarbageCollected --> [*]

Platform Components Repo Structure

appteam-flux-repo/
├── COMPONENTS.yaml              # Registry — CI-validated
├── core/
│   └── cert-manager/
│       ├── 1.14.0/
│       │   ├── base/            # Shared resources
│       │   ├── dev/
│       │   │   └── kustomization.yaml
│       │   ├── qa/
│       │   │   └── kustomization.yaml
│       │   └── prod/
│       │       └── kustomization.yaml
│       └── 1.14.1/              # Hotfix version
│           └── ...
├── observability/
│   └── grafana/
│       ├── 17.0.0/
│       │   └── ...
│       └── 17.1.0/              # Upgrade version
│           └── ...
└── networking/
    └── ingress-nginx/
        └── 4.9.0/
            └── ...

Each environment directory must be buildable in isolation: kustomize build core/cert-manager/1.14.0/prod/ must succeed.

Version Cleanup

Keep N previous versions per component (recommended: 3). CI can prune older version directories. Old OCI tags remain in the registry for emergency rollbacks.

Security & Authentication

Security in this architecture operates at multiple layers: API authentication, cluster identity, network boundaries, and credential management.

Authentication Model

flowchart TD
    subgraph "Child Cluster"
        S["Secret: internal-api-token"]
        P["ResourceSetInputProvider"]
        S -->|"Bearer token"| P
    end

    subgraph "API Layer"
        AUTH["Auth Middleware"]
        API["flux-resourceset"]
        AUTH -->|"validated"| API
    end

    P -->|"Authorization: Bearer <token>"| AUTH

Bearer Token Authentication

The API uses bearer token authentication. Tokens are configured via environment variables:

  • AUTH_TOKEN — required for all read endpoints
  • CRUD_AUTH_TOKEN — required for write endpoints in CRUD mode (defaults to AUTH_TOKEN if not set)

This separation allows:

  • Read-only clusters to use a shared read token
  • Operators/CI to use a separate write token
  • Token rotation without affecting cluster polling (rotate read and write tokens independently)

Cluster-Side Token Storage

Each cluster stores the token in a Kubernetes Secret:

apiVersion: v1
kind: Secret
metadata:
  name: internal-api-token
  namespace: flux-system
type: Opaque
stringData:
  token: "the-bearer-token"

This Secret is either:

  • Pre-installed in the cluster’s bootstrap image or manifests
  • Injected during cluster provisioning (via cloud-init, Terraform, Cluster API, or manual setup)
  • Managed by an external secrets operator that fetches the token from a vault

Upgrading to mTLS

For stricter security requirements, the ResourceSetInputProvider supports TLS client certificates via certSecretRef:

spec:
  type: ExternalService
  url: https://internal-api.internal.example.com/api/v2/flux/...
  certSecretRef:
    name: api-client-cert

This eliminates shared bearer tokens in favor of per-cluster x.509 certificates. The API would need to be configured with a TLS server certificate and a CA trust chain.

Network Security

ConnectionDirectionProtocolAuthentication
Cluster → APIOutbound from clusterHTTPSBearer token or mTLS
Operator → API (CRUD)Inbound to CRUD instanceHTTPSBearer token (write)
API → Data StoreLocal/internalSQLite file access (or in-memory)Filesystem permissions

Network Policy Considerations

  • The API does not need inbound access to clusters — it is purely pull-based
  • Only the flux-system namespace on each cluster needs outbound access to the API
  • CRUD endpoints should be restricted to operator networks or CI/CD runners

Cluster Identity

The cluster-identity ConfigMap is the root of trust for each cluster:

data:
  CLUSTER_NAME: "us-east-prod-01"
  CLUSTER_DNS: "us-east-prod-01.k8s.internal.example.com"
  ENVIRONMENT: "prod"
  INTERNAL_API_URL: "https://internal-api.internal.example.com"

This ConfigMap determines:

  • Which API endpoint the cluster calls
  • Which cluster DNS is used in the URL path (determines what data the cluster receives)
  • What environment tier the cluster belongs to

The ConfigMap is injected during cluster provisioning and should be treated as immutable after bootstrap.

Data Access Control

The API enforces access control at the endpoint level:

EndpointToken RequiredAccess Level
/api/v2/flux/...AUTH_TOKENRead-only — clusters can only read their own data via DNS path
/clusters, /platform_components, etc.CRUD_AUTH_TOKENRead-write — operators can modify any cluster
/health, /readyNonePublic — Kubernetes probes
/openapi.yamlNonePublic — API documentation

Per-Cluster Data Isolation

Each cluster can only access its own data because the API path includes the cluster DNS:

GET /api/v2/flux/clusters/us-east-prod-01.k8s.example.com/platform-components

A cluster cannot query another cluster’s configuration without knowing (and requesting) a different DNS path. The bearer token does not provide cross-cluster access control — all clusters share the same read token. If per-cluster token isolation is required, implement it as an API middleware enhancement.

Secrets in the Data Model

The patches object supports arbitrary key-value pairs. Do not store sensitive values (passwords, API keys, private certificates) in patches. Instead:

  • Use Kubernetes Secrets + ExternalSecrets Operator for sensitive values
  • Use patches only for non-sensitive configuration (replica counts, feature flags, resource limits)
  • For sensitive Helm values, use valuesFrom with a Secret instead of a ConfigMap

Local Demo

This guide walks through running the full demo on a local kind cluster. By the end, you will have:

  • A kind cluster with Flux Operator installed
  • The flux-resourceset API deployed with seed data
  • ResourceSetInputProviders polling the API
  • ResourceSets rendering and reconciling platform components, namespaces, and rolebindings

Prerequisites

Required tools:

  • Rust/Cargo — build the API and CLI
  • Docker — container runtime for kind
  • kind — local Kubernetes clusters
  • kubectl — Kubernetes CLI
  • flux CLI — manual reconcile commands (flux reconcile ...)
  • curl — HTTP requests

Optional tools:

  • jq — pretty JSON output
  • Poetry + Python 3 — for make generate (code generation only)
  • openapi-generator — for Rust model generation (code generation only)

One-Command Demo

cd flux-resourceset
make demo

This runs kind-create and kind-demo, which:

  1. Builds the Docker image (flux-resourceset:local)
  2. Creates a kind cluster named flux-demo
  3. Loads the image into the cluster
  4. Installs the Flux Operator from upstream
  5. Applies base Kubernetes manifests (FluxInstance, RBAC, services)
  6. Waits for Flux controllers to be ready
  7. Creates a seed data ConfigMap from data/seed.json
  8. Deploys the API (read-only + CRUD instances)
  9. Applies ResourceSetInputProviders
  10. Applies ResourceSets

What Gets Deployed

graph TB
    subgraph "flux-system namespace"
        API_R["flux-api-read<br/>(read-only mode)"]
        API_C["flux-api-crud<br/>(CRUD mode)"]
        SEED["ConfigMap: flux-api-seed-data"]

        P1["Provider: platform-components"]
        P2["Provider: namespaces"]
        P3["Provider: rolebindings"]

        RS1["ResourceSet: platform-components"]
        RS2["ResourceSet: namespaces"]
        RS3["ResourceSet: rolebindings"]

        HR1["HelmRelease: platform-cert-manager"]
        HR2["HelmRelease: platform-traefik"]
        HR3["HelmRelease: platform-podinfo"]
    end

    subgraph "Created namespaces"
        NS1["cert-manager"]
        NS2["traefik"]
        NS3["podinfo"]
    end

    SEED -->|"loaded at startup"| API_R
    SEED -->|"loaded at startup"| API_C
    P1 -->|"polls"| API_R
    P2 -->|"polls"| API_R
    P3 -->|"polls"| API_R
    P1 --> RS1
    P2 --> RS2
    P3 --> RS3
    RS1 -->|"renders"| HR1 & HR2 & HR3
    RS2 -->|"renders"| NS1 & NS2 & NS3

Seed Data

The demo uses data/seed.json which contains:

One cluster: demo-cluster-01

  • Environment: dev
  • 3 platform components: cert-manager, traefik, podinfo
  • 3 namespaces: cert-manager, traefik, podinfo
  • 2 rolebindings: platform-admins (cluster-admin), dev-readers (view)
  • Patches for podinfo (replica count, UI color, UI message) and traefik (replicas, service type)

Three catalog entries: cert-manager, traefik, podinfo — each pointing to public Helm chart repositories.

Checking Status

After make demo, verify everything is running:

# Check pods
kubectl get pods -n flux-system

# Check providers
kubectl get resourcesetinputproviders -n flux-system

# Check resourcesets
kubectl get resourcesets -n flux-system

# Check HelmReleases
kubectl get helmreleases -n flux-system

# Check created namespaces
kubectl get namespaces

# Check rolebindings
kubectl get clusterrolebindings platform-admins dev-readers

Running the CLI Demo

The automated CLI demo flow exercises the full lifecycle:

Step 1: Port-forward the API

make cli-demo-port-forward

This exposes the API on http://127.0.0.1:8080.

Step 2: Run the CLI demo

In another terminal:

make cli-demo

This:

  1. Builds the CLI
  2. Lists clusters and namespaces
  3. Adds a new namespace (demo-runtime) via CLI
  4. Forces reconciliation
  5. Waits for the namespace to be created
  6. Verifies the namespace exists

Step 3: Manual CLI exploration

export FLUX_API_URL=http://127.0.0.1:8080
export FLUX_API_TOKEN="$(kubectl -n flux-system get secret internal-api-token \
  -o jsonpath='{.data.token}' | base64 -d)"
export FLUX_API_WRITE_TOKEN="$FLUX_API_TOKEN"

# List clusters
./target/debug/flux-resourceset-cli cluster list | jq .

# List namespaces
./target/debug/flux-resourceset-cli namespace list | jq .

# Get Flux-formatted platform components
curl -s -H "Authorization: Bearer $FLUX_API_TOKEN" \
  http://127.0.0.1:8080/api/v2/flux/clusters/demo-cluster-01.k8s.example.com/platform-components | jq .

Podinfo Patch Demo

This demonstrates dynamic patching — changing Helm values via the API and watching Flux reconcile:

# 1. Check current state
kubectl get configmap -n flux-system values-podinfo-demo-cluster-01 \
  -o jsonpath='replicas={.data.replicaCount} color={.data.ui\.color} message={.data.ui\.message}{"\n"}'
kubectl get deploy -n podinfo podinfo \
  -o jsonpath='replicas={.spec.replicas} color={.spec.template.spec.containers[0].env[?(@.name=="PODINFO_UI_COLOR")].value} message={.spec.template.spec.containers[0].env[?(@.name=="PODINFO_UI_MESSAGE")].value}{"\n"}'

# 2. Patch via CLI
./target/debug/flux-resourceset-cli demo patch-component demo-cluster-01 podinfo \
  --set replicaCount=3 \
  --set ui.message="Hello from CLI patch" \
  --set ui.color="#3b82f6" | jq .

# 3. Force reconcile inputs/templates
kubectl annotate resourcesetinputprovider platform-components -n flux-system \
  fluxcd.controlplane.io/requestedAt="$(date -u +"%Y-%m-%dT%H:%M:%SZ")" --overwrite
kubectl annotate resourceset platform-components -n flux-system \
  fluxcd.controlplane.io/requestedAt="$(date -u +"%Y-%m-%dT%H:%M:%SZ")" --overwrite

# 4. Trigger immediate Helm reconcile
flux reconcile helmrelease platform-podinfo -n flux-system --with-source

# 5. Verify
kubectl get hr -n flux-system platform-podinfo \
  -o jsonpath='ready={.status.conditions[?(@.type=="Ready")].status} reason={.status.conditions[?(@.type=="Ready")].reason} action={.status.lastAttemptedReleaseAction}{"\n"}'
kubectl get configmap -n flux-system values-podinfo-demo-cluster-01 \
  -o jsonpath='replicas={.data.replicaCount} color={.data.ui\.color} message={.data.ui\.message}{"\n"}'
kubectl get deploy -n podinfo podinfo \
  -o jsonpath='replicas={.spec.replicas} color={.spec.template.spec.containers[0].env[?(@.name=="PODINFO_UI_COLOR")].value} message={.spec.template.spec.containers[0].env[?(@.name=="PODINFO_UI_MESSAGE")].value}{"\n"}'

# 6. Optional: check the UI
kubectl -n podinfo port-forward svc/podinfo 9898:9898
# Open http://127.0.0.1:9898

Cleanup

make kind-delete
# or
make clean

This deletes the kind cluster and all associated resources.

CLI Usage

flux-resourceset-cli is a command-line tool for interacting with the CRUD API. It is built from the same codebase and generated from the same Firestone schemas as the API.

Building

cd flux-resourceset
cargo build --bin flux-resourceset-cli

The binary is at target/debug/flux-resourceset-cli.

Environment Variables

VariableRequiredDescription
FLUX_API_URLyesAPI base URL (e.g., http://127.0.0.1:8080)
FLUX_API_TOKENyesBearer token for read operations
FLUX_API_WRITE_TOKENyesBearer token for write operations

Setup from Demo Cluster

export FLUX_API_URL=http://127.0.0.1:8080
export FLUX_API_TOKEN="$(kubectl -n flux-system get secret internal-api-token \
  -o jsonpath='{.data.token}' | base64 -d)"
export FLUX_API_WRITE_TOKEN="$FLUX_API_TOKEN"

Commands

Cluster Operations

# List all clusters
flux-resourceset-cli cluster list

# Get a specific cluster
flux-resourceset-cli cluster get demo-cluster-01

Namespace Operations

# List all namespaces
flux-resourceset-cli namespace list

# Get a specific namespace
flux-resourceset-cli namespace get cert-manager

# Create namespace record and attach reference to a cluster
flux-resourceset-cli namespace create team-sandbox --cluster demo-cluster-01 \
  --label team=sandbox --annotation owner=platform

# Attach/detach an existing namespace record
flux-resourceset-cli namespace assign team-sandbox --cluster demo-cluster-01
flux-resourceset-cli namespace unassign team-sandbox --cluster demo-cluster-01

Platform Component Operations

# List all catalog components
flux-resourceset-cli component list

# Get a specific component
flux-resourceset-cli component get cert-manager

# Create/ensure catalog component, then attach to cluster
flux-resourceset-cli component create cert-manager \
  --component-path core/cert-manager/1.14.0 \
  --component-version 1.14.0 \
  --oci-url oci://registry.example/platform-components \
  --oci-tag v1.0.0 \
  --cluster demo-cluster-01

# Attach/detach existing component references
flux-resourceset-cli component assign cert-manager --cluster demo-cluster-01
flux-resourceset-cli component unassign cert-manager --cluster demo-cluster-01

# Patch per-cluster component values
flux-resourceset-cli component patch podinfo --cluster demo-cluster-01 --set replicaCount=3

Demo Commands

The CLI includes demo-specific commands for common workflows:

# Add a namespace to a cluster
flux-resourceset-cli demo add-namespace <cluster-id> <namespace> \
  --label team=platform \
  --annotation owner=you

# Patch one component using dynamic key/value paths
flux-resourceset-cli demo patch-component <cluster-id> <component-id> \
  --set replicaCount=3 \
  --set ui.message="Hello" \
  --set ui.color="#3b82f6"

# Get Flux-formatted namespace response
flux-resourceset-cli demo flux-namespaces <cluster-dns>

Output

All CLI commands output JSON. Pipe to jq for pretty formatting:

flux-resourceset-cli cluster list | jq .

Workflow Examples

Add a namespace and watch Flux create it

# 1. Create namespace + attach reference
flux-resourceset-cli namespace create team-sandbox --cluster demo-cluster-01 \
  --label team=sandbox --annotation owner=platform

# 2. Force reconcile
kubectl annotate resourcesetinputprovider namespaces -n flux-system \
  fluxcd.controlplane.io/requestedAt="$(date -u +"%Y-%m-%dT%H:%M:%SZ")" --overwrite
kubectl annotate resourceset namespaces -n flux-system \
  fluxcd.controlplane.io/requestedAt="$(date -u +"%Y-%m-%dT%H:%M:%SZ")" --overwrite

# 3. Wait and verify
kubectl get ns team-sandbox

Patch a component and verify

# 1. Patch
flux-resourceset-cli demo patch-component demo-cluster-01 podinfo --set replicaCount=5

# 2. Refresh provider + resourceset
kubectl annotate resourcesetinputprovider platform-components -n flux-system \
  fluxcd.controlplane.io/requestedAt="$(date -u +"%Y-%m-%dT%H:%M:%SZ")" --overwrite
kubectl annotate resourceset platform-components -n flux-system \
  fluxcd.controlplane.io/requestedAt="$(date -u +"%Y-%m-%dT%H:%M:%SZ")" --overwrite

# 3. Trigger immediate Helm upgrade
flux reconcile helmrelease platform-podinfo -n flux-system --with-source

# 4. Verify
kubectl get deploy -n podinfo podinfo \
  -o jsonpath='replicas={.spec.replicas} color={.spec.template.spec.containers[0].env[?(@.name=="PODINFO_UI_COLOR")].value} message={.spec.template.spec.containers[0].env[?(@.name=="PODINFO_UI_MESSAGE")].value}{"\n"}'

Extending with New Resource Types

The architecture is designed to be extended with new resource types beyond the initial three (platform-components, namespaces, rolebindings). Adding a new resource type follows a consistent pattern.

The Pattern

Every resource type requires four pieces:

flowchart TD
    A["1. Data Schema<br/>(Firestone resource definition)"] --> B["2. API Endpoint<br/>(returns {inputs: [...]})"]
    B --> C["3. ResourceSetInputProvider<br/>(calls the endpoint)"]
    C --> D["4. ResourceSet Template<br/>(renders Kubernetes resources)"]

Step-by-Step: Adding Network Policies

Let’s walk through adding a network-policies resource type.

Step 1: Define the Firestone Schema

Create resources/network_policy.yaml:

kind: network_policy
apiVersion: v1
schema:
  type: object
  required: [id, target_namespace, ingress_rules]
  properties:
    id:
      type: string
      example: allow-monitoring
    target_namespace:
      type: string
      example: monitoring
    ingress_rules:
      type: array
      items:
        type: object
        properties:
          from_namespace:
            type: string
          port:
            type: integer

Step 2: Add to the Cluster Schema

In resources/cluster.yaml, add a network_policies array:

network_policies:
  type: array
  items:
    $ref: "#/components/schemas/network_policy_ref"
  description: Network policies to sync to this cluster.

Step 3: Regenerate Code

make generate

This updates the OpenAPI spec, Rust models, and CLI modules.

Step 4: Implement the API Endpoint

Add GET /api/v2/flux/clusters/{cluster_dns}/network-policies that returns:

{
  "inputs": [
    {
      "id": "allow-monitoring",
      "target_namespace": "monitoring",
      "ingress_rules": [
        { "from_namespace": "prometheus", "port": 9090 }
      ],
      "cluster": {
        "name": "us-east-prod-01",
        "dns": "us-east-prod-01.k8s.example.com",
        "environment": "prod"
      }
    }
  ]
}

Step 5: Create the ResourceSetInputProvider

apiVersion: fluxcd.controlplane.io/v1
kind: ResourceSetInputProvider
metadata:
  name: network-policies
  namespace: flux-system
  annotations:
    fluxcd.controlplane.io/reconcileEvery: "5m"
spec:
  type: ExternalService
  url: "${INTERNAL_API_URL}/api/v2/flux/clusters/${CLUSTER_DNS}/network-policies"
  secretRef:
    name: internal-api-token

Step 6: Create the ResourceSet Template

apiVersion: fluxcd.controlplane.io/v1
kind: ResourceSet
metadata:
  name: network-policies
  namespace: flux-system
spec:
  inputsFrom:
    - name: network-policies
  resourcesTemplate: |
    ---
    apiVersion: networking.k8s.io/v1
    kind: NetworkPolicy
    metadata:
      name: << inputs.id >>
      namespace: << inputs.target_namespace >>
    spec:
      podSelector: {}
      policyTypes:
        - Ingress
      ingress:
        <<- range $rule := inputs.ingress_rules >>
        - from:
            - namespaceSelector:
                matchLabels:
                  kubernetes.io/metadata.name: << $rule.from_namespace >>
          ports:
            - port: << $rule.port >>
        <<- end >>

Step 7: Deploy

Add the provider and ResourceSet to the bootstrap manifests (for new clusters) and apply them to existing clusters.

What Makes This Extensible

AspectHow It Helps
Consistent contractEvery resource type uses {"inputs": [...]} — same provider, same pattern
Independent providersEach resource type polls independently — no coupling
Schema-drivenFirestone generates models, OpenAPI, and CLI for new types automatically
Template isolationEach ResourceSet template handles one type — no monolithic templates

Ideas for Additional Resource Types

Resource TypeKubernetes ResourcesUse Case
Network PoliciesNetworkPolicyPer-cluster network segmentation
Resource QuotasResourceQuota, LimitRangeNamespace resource limits
SecretsExternalSecret (ESO)Centralized secret management
Ingress RoutesIngress, IngressRoutePer-cluster routing rules
Custom CRDsAny custom resourceOrganization-specific resources

Each follows the same four-piece pattern: schema, endpoint, provider, template.

Frequently Asked Questions

Architecture & Design Decisions

Why an API instead of direct Kubernetes API access?

A common reaction is: “Why not just give operators kubectl access or build tooling that talks directly to the Kubernetes API on each cluster?”

The answer comes down to control, safety, and scale:

ConcernDirect Kubernetes APIPurpose-built API (this)
Blast radiusOne bad kubectl apply can break a cluster. Operators need kubeconfig access to every cluster.All changes flow through a single API with validation. No direct cluster access needed for platform operations.
Business logicThe Kubernetes API has no concept of “platform components,” “environment tiers,” or “component catalogs.” You build that logic into scripts.The API encodes your organization’s domain model. Merge logic, catalog defaults, environment resolution, and patching rules are built in.
Audit trailKubernetes audit logs are per-cluster and verbose. Correlating “who changed what across 200 clusters” is painful.One API, one audit log. Every mutation is traceable to a user, timestamp, and change payload.
IntegrationIntegrating CI/CD, chatops, ticketing, or approval workflows with raw Kubernetes APIs across many clusters requires custom glue per cluster.One REST API to integrate with. Webhooks, CI pipelines, Slack bots, and approval systems all talk to one endpoint.
Credential managementOperators (or CI) need kubeconfigs for every cluster. Rotating credentials means touching every cluster.Operators need one API token. Clusters hold one read token. Token rotation is centralized.
ConsistencyWithout enforcement, two operators can configure the same component differently on two clusters. Scripts drift.The catalog + merge model guarantees consistent computed state. Per-cluster differences are explicit and auditable.
RollbackRolling back a kubectl apply requires knowing exactly what was applied and in what order.Revert the API data. Next poll cycle, Flux reconciles back.

In short: The Kubernetes API is a powerful infrastructure primitive, but it is not a platform management API. This service adds the domain logic, guardrails, and integration surface that enterprise operations require.

Is this actually GitOps?

Yes — with a nuance. This is a GitOps-based model that adds an API-driven data layer.

The GitOps principles are preserved:

  • Declarative — desired state is declared in structured data (API) and templates (Git)
  • Versioned and immutable — templates are version-controlled in Git. API data changes are auditable and reversible.
  • Pulled automatically — clusters pull their state; no manual push required
  • Continuously reconciled — Flux detects and corrects drift automatically

What the API adds:

  • Dynamic data — instead of static YAML files per cluster, the API computes each cluster’s state from catalog + overrides
  • Operational velocity — data changes (scaling, patching, enabling/disabling) do not require Git PRs
  • Business logic — merge rules, catalog defaults, and environment resolution happen in the API, not in Git overlays

The templates that govern how resources are deployed still live in Git and go through standard review. The API controls what is deployed where — the operational data plane.

Why not ArgoCD ApplicationSets?

ArgoCD ApplicationSets solve a similar problem (managing resources across many clusters) but take a fundamentally different approach:

AspectArgoCD ApplicationSetsThis architecture
ModelPush from management clusterPull from each cluster
Management cluster dependencyRequired — ArgoCD must maintain connections to all clustersNot required for platform management — clusters are autonomous
Failure modeManagement cluster down = no reconciliation anywhereAPI down = clusters keep running, just cannot get updates
Kubeconfig managementArgoCD needs kubeconfigs for every target clusterEach cluster holds one API bearer token
Network directionManagement cluster → target clusters (requires inbound access to clusters)Target clusters → API (outbound only)
Data sourceGit repos with generators (list, cluster, git, matrix)API with merge logic and dynamic catalog
Per-cluster overridesGenerators + overlays (can get complex)First-class patches object in the API

Both are valid approaches. ApplicationSets work well when you have a stable management cluster with reliable connectivity to all targets. The phone-home model works better when clusters are distributed, network connectivity is unreliable, or you need clusters to be autonomous.

Does this work on-premises?

Yes. The architecture is infrastructure-agnostic. It has no dependency on any specific cloud provider, VM provisioner, or Kubernetes distribution.

EnvironmentRequirements
On-prem bare metalKubernetes cluster with Flux Operator installed. Outbound HTTPS to the API.
On-prem VMsSame — any hypervisor (VMware, KVM, Hyper-V).
Public cloud (EKS, AKS, GKE)Deploy Flux Operator as a Helm chart or add-on.
Edge / remote sitesLightweight K8s (k3s, k0s, MicroK8s). Can work over VPN or direct internet.
Air-gappedPossible with a local API mirror and OCI registry mirror inside the air gap.
HybridMix any of the above. Every cluster phones home to the same API.

The provisioning tooling is completely decoupled. Whether you use Terraform, Cluster API, Crossplane, Rancher, manual scripts, or your own management cluster — once Flux is running and the cluster-identity ConfigMap exists, the phone-home loop works.

Why separate read-only and CRUD modes?

The two modes serve fundamentally different access patterns:

ModeConsumersPatternScaling
read-onlyHundreds/thousands of clusters pollingHigh concurrency, small payloads, predictable loadMulti-replica, horizontal scaling
crudOperators, CLI, CI/CD pipelinesLow concurrency, larger payloads, burstySingle replica or small deployment

Separating them gives you:

  • Independent scaling — read replicas scale with fleet size; CRUD does not need to
  • Security boundary — read-only instances never accept writes; separate tokens for each
  • Blast radius — a CRUD deployment issue does not affect cluster polling
  • Simpler operations — read-only instances are stateless and disposable

Operational Questions

What happens if the API goes down?

Clusters keep running. They continue reconciling from their last-known state. Existing HelmReleases, Namespaces, and ClusterRoleBindings all remain in place and healthy.

What stops working:

  • New configuration changes are not picked up until the API recovers
  • The ResourceSetInputProvider status shows not-ready
  • Alerts should fire based on provider status conditions

This is a key advantage over push-based models — API downtime is an inconvenience, not an outage.

How do I roll back a bad change?

  1. Revert the API data — update the cluster document or catalog entry back to the previous state
  2. Wait for next poll — or force an immediate reconcile with kubectl annotate
  3. Flux reconciles — the ResourceSet re-renders with the reverted data, and Flux applies the diff

For template changes (in Git), use standard Git revert workflows. Flux picks up the reverted template on next reconcile.

How do I handle secrets?

The patches object is for non-sensitive configuration only (replica counts, feature flags, resource limits). For secrets:

  • Use the External Secrets Operator to sync secrets from a vault (HashiCorp Vault, AWS Secrets Manager, Azure Key Vault, etc.)
  • Reference Kubernetes Secrets in HelmRelease valuesFrom instead of ConfigMaps
  • Add an external-secrets resource type to the API to manage ESO ExternalSecret resources via the same phone-home pattern

Can I use this with existing Flux installations?

Yes. The ResourceSetInputProvider and ResourceSet are standard Flux Operator CRDs. They coexist with existing GitRepositories, HelmRepositories, Kustomizations, and HelmReleases.

You can adopt incrementally:

  1. Install the Flux Operator alongside existing Flux controllers
  2. Deploy providers and ResourceSets for one resource type (e.g., namespaces)
  3. Migrate additional resource types as confidence grows
  4. Existing Git-based Flux resources continue working unchanged

How does this compare to Helm value files per cluster?

AspectHelm values per clusterAPI-driven patching
StorageYAML files in Git (one per cluster, or overlays)Structured data in the API
Updating 100 clusters100 file edits + PRBatch API call
Per-cluster customizationOverlay hierarchy (can get deeply nested)Flat patches object per cluster per component
Dynamic valuesRequires scripted Git commitsAPI call → next poll → reconciled
Review requirementGit PR for every change (even scaling)API auth for data changes; Git PR for template changes
Merge conflictsPossible with concurrent PRsNot possible — API handles concurrency

Can I extend this beyond platform components?

Yes. The architecture is designed for it. Any Kubernetes resource type can be managed this way. See the Extending chapter for a step-by-step walkthrough.

Ideas that organizations have considered:

  • Network policies
  • Resource quotas and limit ranges
  • External secrets
  • Ingress routes and TLS certificates
  • Custom CRDs specific to the organization
  • Monitoring and alerting configurations (PrometheusRule, ServiceMonitor)

Each follows the same pattern: schema, endpoint, provider, template.

Performance & Scale

How many clusters can this support?

The API is stateless and the per-request cost is minimal (one data store read + one merge). Rough numbers:

ClustersResource TypesPoll IntervalRequests/sec
10035 min1
50035 min5
1,00035 min10
5,00035 min50
10,00055 min167

Even at 10,000 clusters with 5 resource types, the load is ~167 req/sec — well within the capacity of a small API deployment. Add read replicas for HA, not for throughput.

What is the latency from API change to cluster reconciliation?

It depends on the poll interval configured on the ResourceSetInputProvider. The default is 5 minutes. For faster feedback:

  • Set fluxcd.controlplane.io/reconcileEvery: "30s" on the provider (the demo uses this)
  • Force immediate reconciliation by annotating the provider with fluxcd.controlplane.io/requestedAt
  • In practice, 5-minute intervals are fine for production — platform component changes are not latency-sensitive

Does every cluster get the full catalog?

No. Each cluster only receives the components, namespaces, and rolebindings assigned to it in the cluster document. The API computes a cluster-specific response — a cluster with 5 components gets 5 inputs, not the entire catalog.