Flux ResourceSet — API-Driven GitOps

flux-resourceset is a repo containing an example API service that powers an API-driven, GitOps-based model for managing Kubernetes clusters at enterprise scale. Instead of a central management cluster pushing configuration to child clusters, each child cluster pulls its own desired state from this API — and Flux reconciles the difference.

A GitOps-based model. The ResourceSet templates that define how resources are rendered live in Git and follow standard GitOps review workflows. The API adds a dynamic data layer on top — what each cluster should run is served by the API, while how it is deployed is governed by version-controlled templates. The combination preserves GitOps principles (declarative, versioned, continuously reconciled) while adding the operational flexibility that enterprise multi-cluster management demands.

The Problem

Traditional enterprise Kubernetes platforms suffer from:

Slow provisioning — cluster creation taking weeks, not minutes
State divergence — configuration management tools (Ansible, Terraform, Puppet, Salt, or custom automation scripts), CMDB databases, and actual cluster state drifting apart over time
Manual release ceremonies — PRs, approvals, and tier-by-tier rollouts for every platform component change
Scaling bottlenecks — centralized push-based management that breaks down at hundreds of clusters
Infrastructure lock-in — tooling that assumes a specific cloud provider or VM provisioner, making hybrid and multi-cloud deployments painful

The Solution

This project implements a resource-driven, pull-based architecture where:

A central API (this service) is the single source of truth for cluster configuration
Each cluster’s Flux Operator phones home to fetch its desired state
ResourceSet templates render Kubernetes resources from the API response
Flux continuously reconciles — any API change is automatically applied

This model is infrastructure-agnostic. It works on bare-metal on-premises data centers, private cloud, public cloud (AWS EKS, Azure AKS, GCP GKE), edge locations, or any hybrid combination. The only requirement is that each cluster can make outbound HTTPS requests to the API endpoint.

graph TB
    API["flux-resourceset API<br/>(single source of truth)"]

    subgraph "Child Cluster 1"
        P1["ResourceSetInputProvider<br/>(polls every 5m)"]
        RS1["ResourceSet<br/>(renders templates)"]
        K1["Flux Kustomize/Helm<br/>(reconciles)"]
        P1 -->|"fetches inputs"| RS1
        RS1 -->|"creates resources"| K1
    end

    subgraph "Child Cluster 2"
        P2["ResourceSetInputProvider"]
        RS2["ResourceSet"]
        K2["Flux Kustomize/Helm"]
        P2 --> RS2 --> K2
    end

    subgraph "Child Cluster N"
        PN["ResourceSetInputProvider"]
        RSN["ResourceSet"]
        KN["Flux Kustomize/Helm"]
        PN --> RSN --> KN
    end

    P1 -->|"GET /clusters/{dns}/platform-components"| API
    P2 -->|"GET /clusters/{dns}/namespaces"| API
    PN -->|"GET /clusters/{dns}/rolebindings"| API

Key Upstream Projects

This architecture builds on two open-source projects:

Flux Operator — provides the ResourceSet and ResourceSetInputProvider CRDs that power the templating and phone-home polling. The ExternalService input type is the foundation this architecture is built on. (GitHub)
Firestone — a resource-based API specification generator that converts JSON Schema definitions into OpenAPI specs, CLI tools, and downstream code. Firestone defines the resource schemas (cluster, platform_component, namespace, rolebinding) that drive code generation for this project.

What This Service Does

flux-resourceset reads cluster configuration data, merges per-cluster overrides with catalog defaults, and returns responses in the {"inputs": [...]} format that the Flux Operator’s ResourceSetInputProvider (ExternalService type) requires.

Each resource type gets its own endpoint:

Endpoint	What It Returns
`GET /api/v2/flux/clusters/{dns}/platform-components`	HelmRelease + HelmRepository + ConfigMap inputs per component
`GET /api/v2/flux/clusters/{dns}/namespaces`	Namespace inputs with labels and annotations
`GET /api/v2/flux/clusters/{dns}/rolebindings`	ClusterRoleBinding inputs with subjects
`GET /api/v2/flux/clusters`	Cluster list for management plane provisioning

Key Concepts

Concept	Description
Phone-home model	Clusters pull config; the API never pushes. Scales to thousands of clusters.
Resource-driven development	Define resources (clusters, components, namespaces) as structured data. Templates turn data into Kubernetes manifests.
Dynamic patching	Per-cluster, per-component value overrides without touching Git. Change a replica count in the API and watch Flux reconcile.
Catalog + overrides	Platform components live in a catalog with defaults. Each cluster can override `oci_tag`, `component_path`, or inject custom patches.
ExternalService contract	All responses follow `{"inputs": [{"id": "...", ...}]}` — the format Flux Operator requires.
Infrastructure-agnostic	Works on-prem, in the cloud, at the edge, or across hybrid environments. No vendor lock-in.

Quick Start

cd flux-resourceset
make demo          # Creates kind cluster, installs Flux, deploys API + demo data
make cli-demo      # Runs the CLI demo flow end-to-end

See the Local Demo chapter for full details.

System Overview

The architecture separates concerns into three layers: the data plane (where cluster config lives), the API plane (this service), and the cluster plane (Flux running on each child cluster).

High-Level Architecture

graph TB
    subgraph "Data Layer"
        DB[("Data Store<br/>(SQLite / In-Memory)")]
    end

    subgraph "API Layer"
        READ["flux-resourceset<br/>(read-only mode)"]
        CRUD["flux-resourceset<br/>(CRUD mode)"]
        CLI["flux-resourceset-cli"]
    end

    subgraph "Cluster Layer"
        subgraph "Child Cluster"
            RSIP["ResourceSetInputProvider<br/>type: ExternalService"]
            RS["ResourceSet<br/>(templates)"]
            HR["HelmRelease / Kustomization"]
            NS["Namespace"]
            RB["ClusterRoleBinding"]
        end
    end

    DB -->|"read"| READ
    DB <-->|"read/write"| CRUD
    CLI -->|"CRUD operations"| CRUD
    RSIP -->|"polls"| READ
    RSIP -->|"inputs"| RS
    RS -->|"renders"| HR
    RS -->|"renders"| NS
    RS -->|"renders"| RB

Component Roles

Data Store

By default, this is SQLite (configured via DATABASE_URL). For lightweight/dev workflows it can run in-memory (STORE_BACKEND=memory) using data/seed.json as initial state.

The store holds four logical resource sets:

clusters — each cluster’s full configuration: assigned components, namespaces, rolebindings, and per-component patches
platform_components — component catalog entries with defaults, OCI URLs/tags, and dependencies
namespaces — reusable namespace definitions referenced by clusters
rolebindings — reusable RBAC rolebinding definitions referenced by clusters

API Service (flux-resourceset)

A Rust service built with axum that operates in two modes:

Mode	Purpose	Endpoints
`read-only`	Flux polling — high concurrency, minimal resource usage	`/api/v2/flux/...`, `/health`, `/ready`
`crud`	Operator/CLI access — full CRUD for managing cluster state	All read endpoints + `/clusters`, `/platform_components`, `/namespaces`, `/rolebindings`

The read-only mode is designed to run as a multi-replica deployment serving cluster polls. The CRUD mode is for operators and CI/CD pipelines that need to modify cluster configuration.

CLI (flux-resourceset-cli)

A command-line tool for interacting with the CRUD API. Supports listing, creating, and patching resources. Used for demos and operational workflows.

Flux Operator (on each cluster)

Each cluster runs:

ResourceSetInputProvider — calls the API on a schedule, fetches {"inputs": [...]}
ResourceSet — takes the inputs and renders Kubernetes manifests from templates
Flux controllers — reconcile the rendered manifests (HelmRelease, Kustomization, Namespace, etc.)

Data Flow

sequenceDiagram
    participant Operator as Operator / CLI
    participant API as flux-resourceset (CRUD)
    participant DB as Data Store
    participant ReadAPI as flux-resourceset (read-only)
    participant Cluster as Child Cluster (Flux)

    Operator->>API: PATCH /clusters/demo-cluster-01<br/>{"patches": {"podinfo": {"replicaCount": "3"}}}
    API->>DB: Update cluster document
    API-->>Operator: 200 OK

    Note over Cluster: Every 5 minutes (or on-demand)

    Cluster->>ReadAPI: GET /api/v2/flux/clusters/{dns}/platform-components
    ReadAPI->>DB: Fetch cluster + catalog docs
    DB-->>ReadAPI: Cluster doc + component catalog
    ReadAPI->>ReadAPI: Merge overrides with catalog defaults
    ReadAPI-->>Cluster: {"inputs": [{...component with patches...}]}

    Cluster->>Cluster: ResourceSet renders HelmRelease with patched values
    Cluster->>Cluster: Flux reconciles — podinfo scales to 3 replicas

Why This Architecture

vs. Push-Based (ArgoCD ApplicationSets, central Flux)

Concern	Push-based	Phone-home (this)
Scalability	Management cluster must maintain connections to all children	Each cluster independently polls; API is stateless
Failure blast radius	Management cluster outage = all clusters lose reconciliation	API outage = clusters keep running last-known state
Network requirements	Management cluster needs outbound access to all clusters	Clusters need outbound access to one API endpoint
Credential management	Management cluster holds kubeconfigs for all clusters	Each cluster holds one bearer token

vs. Git-per-Cluster

Concern	Git-per-cluster	API-driven (this)
Updating 500 clusters	500 PRs or complex monorepo tooling	One API call to update the component catalog
Per-cluster overrides	Branch strategies or overlay directories	First-class `patches` object per cluster
Audit trail	Git history	API audit log + Git history for templates
Dynamic response	Static YAML files	Merge logic computes cluster-specific state

vs. Direct Kubernetes API Access

A common question is: why not have operators kubectl apply directly, or build tooling that talks to the Kubernetes API on each cluster? See the FAQ for a detailed answer. The short version: a purpose-built API gives you a single control point with business logic, validation, audit logging, and integration hooks — things the raw Kubernetes API does not provide at fleet scale.

Infrastructure Agnostic

This architecture has no dependency on a specific cloud provider, VM provisioner, or Kubernetes distribution. The phone-home pattern requires only one thing: outbound HTTPS from each cluster to the API.

graph TB
    API["flux-resourceset API"]

    subgraph "On-Premises Data Center"
        OP1["Bare-metal cluster"]
        OP2["VMware vSphere cluster"]
    end

    subgraph "Public Cloud"
        AWS["AWS EKS"]
        AZ["Azure AKS"]
        GCP["GCP GKE"]
    end

    subgraph "Edge"
        E1["Edge location 1"]
        E2["Edge location 2"]
    end

    OP1 & OP2 -->|"HTTPS"| API
    AWS & AZ & GCP -->|"HTTPS"| API
    E1 & E2 -->|"HTTPS"| API

Environment	How It Works
On-prem bare metal	Clusters provisioned via PXE boot, cloud-init, or immutable OS images. Flux bootstrap manifests pre-installed or applied post-boot.
On-prem VMs	VMware, KVM, Hyper-V, or any hypervisor. Same bootstrap pattern — inject identity, let Flux phone home.
Public cloud managed K8s	EKS, AKS, GKE — deploy Flux Operator as an add-on or Helm chart. Providers and ResourceSets applied via GitOps or cluster bootstrap.
Edge / remote sites	Lightweight clusters (k3s, k0s, MicroK8s) at edge locations. Phone home over VPN or direct HTTPS.
Hybrid	Mix any of the above. Each cluster phones home to the same API regardless of where it runs.

The cluster provisioning mechanism is completely decoupled from the platform component management. Whether you use Terraform, Crossplane, Cluster API, custom scripts, or manual provisioning — once Flux is running and the cluster-identity ConfigMap exists, the phone-home loop takes over.

Phone-Home Model

The phone-home model is the core architectural pattern. Every child cluster is self-managing — it phones home to the API to discover its desired state, then reconciles locally. The provisioning layer’s only job is creating the cluster infrastructure and injecting a bootstrap identity. After that, the child cluster is autonomous.

How It Works

sequenceDiagram
    participant Mgmt as Management Cluster
    participant VM as Child Cluster VMs
    participant Flux as Flux Operator
    participant API as flux-resourceset API

    Mgmt->>VM: Provision cluster infrastructure<br/>Inject cluster-identity ConfigMap
    VM->>Flux: Cluster boots → Flux Operator starts
    Flux->>Flux: Reads cluster-identity ConfigMap<br/>(CLUSTER_NAME, CLUSTER_DNS, ENVIRONMENT)

    loop Every reconcile interval
        Flux->>API: GET /clusters/{CLUSTER_DNS}/platform-components
        API-->>Flux: {"inputs": [...components...]}
        Flux->>Flux: ResourceSet renders HelmRelease per component
        Flux->>Flux: Flux reconciles rendered resources

        Flux->>API: GET /clusters/{CLUSTER_DNS}/namespaces
        API-->>Flux: {"inputs": [...namespaces...]}
        Flux->>Flux: ResourceSet renders Namespace resources

        Flux->>API: GET /clusters/{CLUSTER_DNS}/rolebindings
        API-->>Flux: {"inputs": [...bindings...]}
        Flux->>Flux: ResourceSet renders ClusterRoleBinding resources
    end

    Note over Mgmt: Management cluster is out of the loop<br/>for all platform component management

Bootstrap Flow

The bootstrap sequence is designed so that every cluster starts identically and differentiates itself only through the API response:

Cluster provisioning — The infrastructure layer creates the cluster. This could be VMs from immutable OS images, cloud-managed Kubernetes (EKS, AKS, GKE), bare-metal nodes via PXE boot, or any other provisioning method. The Flux Operator bootstrap manifests are pre-installed in the image or applied post-boot.

Identity injection — A cluster-identity ConfigMap is the only cluster-specific data injected during provisioning:

apiVersion: v1
kind: ConfigMap
metadata:
  name: cluster-identity
  namespace: flux-system
data:
  CLUSTER_NAME: "us-east-prod-01"
  CLUSTER_DNS: "us-east-prod-01.k8s.internal.example.com"
  ENVIRONMENT: "prod"
  INTERNAL_API_URL: "https://internal-api.internal.example.com"

Flux bootstrap — The cluster boots. Pre-installed or applied manifests start the Flux Operator and deploy the ResourceSetInputProviders + ResourceSets.
Phone home — Each ResourceSetInputProvider calls the API using the cluster’s DNS name from the identity ConfigMap. The API returns that cluster’s specific configuration.
Self-reconciliation — Flux renders and reconciles. From this point forward, the cluster is self-managing.

What Happens When the API Is Unreachable

The phone-home model degrades gracefully:

Scenario	Cluster Behavior
API down for minutes	ResourceSetInputProvider goes not-ready. Existing Flux resources continue reconciling from cached state. No disruption.
API down for hours	Same — clusters keep running. They just cannot pick up new configuration changes.
API returns changed data	On next successful poll, ResourceSet re-renders. Flux applies the diff.
API returns empty inputs	Flux garbage-collects all resources the ResourceSet previously created. This is the decommission path.

Separation of Concerns

graph LR
    subgraph "Provisioning Layer"
        A["Cluster Provisioning<br/>(Terraform, Cluster API,<br/>cloud CLI, PXE, etc.)"]
        B["DNS / Networking<br/>Setup"]
        C["Identity Injection<br/>(cluster-identity ConfigMap)"]
    end

    subgraph "API Layer"
        D["Single Source of Truth<br/>for all cluster configuration"]
    end

    subgraph "Child Cluster"
        E["Platform component<br/>deployment & reconciliation"]
        F["Namespace & RBAC<br/>management"]
    end

    A --> C
    B --> C
    D -.->|"polled by"| E
    D -.->|"polled by"| F

The provisioning layer never deploys platform components to child clusters. It creates infrastructure and injects identity. The child cluster owns its own desired state by polling the API. This separation means the provisioning tooling (whether Terraform, Cluster API, Crossplane, custom scripts, or a management cluster) has no ongoing role in platform component management.

Per-Resource-Type Providers

Each resource type gets its own ResourceSetInputProvider + ResourceSet pair. This separation ensures:

Independent reconciliation — a namespace change does not trigger platform component re-rendering
Independent failure — if one provider fails, others continue working
Clear templates — each ResourceSet template is focused on one resource type

Resource Type	Provider Name	Endpoint
Platform components	`platform-components`	`/api/v2/flux/clusters/{dns}/platform-components`
Namespaces	`namespaces`	`/api/v2/flux/clusters/{dns}/namespaces`
Role bindings	`rolebindings`	`/api/v2/flux/clusters/{dns}/rolebindings`

All providers are pre-installed in every cluster’s bootstrap manifests. The cluster does not need to know what resource types exist — it polls all of them from boot.

Resource-Driven Development

Resource-driven development is the design philosophy behind this architecture. Instead of writing imperative scripts or maintaining per-cluster YAML, you define resources as structured data and let templates + reconciliation handle the rest.

The Idea

Every entity in the platform is a resource with a schema:

erDiagram
    CLUSTER ||--o{ COMPONENT_REF : "has platform_components"
    CLUSTER ||--o{ NAMESPACE_REF : "has namespaces"
    CLUSTER ||--o{ ROLEBINDING_REF : "has rolebindings"
    CLUSTER ||--o{ PATCH : "has patches"
    COMPONENT_REF }o--|| CATALOG_ENTRY : "references"
    NAMESPACE_REF }o--|| NAMESPACE_DEF : "references"
    ROLEBINDING_REF }o--|| ROLEBINDING_DEF : "references"

    CLUSTER {
        string id PK
        string cluster_name
        string cluster_dns
        string environment
    }
    COMPONENT_REF {
        string id FK
        boolean enabled
        string oci_tag "nullable override"
        string component_path "nullable override"
    }
    CATALOG_ENTRY {
        string id PK
        string component_path
        string component_version
        string oci_url
        string oci_tag
        boolean cluster_env_enabled
        string[] depends_on
    }
    NAMESPACE_REF {
        string id
    }
    ROLEBINDING_REF {
        string id
    }
    NAMESPACE_DEF {
        string id PK
        object labels
        object annotations
    }
    ROLEBINDING_DEF {
        string id PK
        string role
        object[] subjects
    }
    PATCH {
        string component_id FK
        object key_values
    }

cluster.namespaces and cluster.rolebindings are reference arrays (id only). Full namespace/rolebinding payloads live in their own definition resources and are resolved during merge.

Resources are declared, not scripted. The API merges them. Templates render them. Flux reconciles them.

Three-Layer Separation

The architecture cleanly separates what from how from where:

Layer	Responsibility	Who Owns It	Example
Data	What should exist on each cluster	Platform operators via API/CLI	“Cluster X should have cert-manager v1.14.0 with 3 replicas”
Templates	How resources are rendered into Kubernetes manifests	Platform engineers via Git	ResourceSet template that turns an input into a HelmRelease
Reconciliation	Where and when resources are applied	Flux Operator (automated)	Flux detects drift and applies the diff

This separation means:

Operators change cluster state by updating data (API calls), not by writing YAML
Engineers change how things are deployed by updating templates (Git PRs), not by touching every cluster
Flux handles the convergence loop — no manual kubectl apply, no configuration management playbooks, no custom deployment scripts

How a Change Flows Through the System

Example: Adding a new platform component to 50 clusters

Traditional approach:

Write Helm values for 50 clusters (or complex overlay structure)
Open PR to add component to each cluster’s directory
Wait for PR review and merge
Watch tier-by-tier rollout
Debug failures per-cluster

Resource-driven approach:

Add the component to the catalog (one API call)
Add a component reference to each cluster’s platform_components array (one API call per cluster, or a batch script)
Done — Flux picks it up on next poll

flowchart LR
    A["API call:<br/>Add component<br/>to catalog"] --> B["API call:<br/>Add component_ref<br/>to cluster doc"]
    B --> C["Next poll cycle:<br/>Provider fetches<br/>updated inputs"]
    C --> D["ResourceSet renders<br/>HelmRepo + HelmRelease"]
    D --> E["Flux reconciles:<br/>component installed"]

Example: Patching a component value on one cluster

flowchart LR
    A["CLI:<br/>patch-component podinfo<br/>--set replicaCount=3"] --> B["API updates<br/>cluster.patches.podinfo"]
    B --> C["Provider fetches<br/>updated inputs"]
    C --> D["ResourceSet renders<br/>HelmRelease with<br/>valuesFrom ConfigMap"]
    D --> E["Flux reconciles:<br/>podinfo scales to 3"]

No Git PR. No pipeline. The data change flows through the system automatically.

Resource Schemas as API Contracts

Each resource type has a defined schema managed via Firestone — a resource-based API specification generator that converts JSON Schema definitions into OpenAPI specs, CLI tools, and downstream code generation artifacts.

The schemas:

cluster (v2) — the full cluster document with arrays of component refs, namespace refs, rolebinding refs, and a patches object
platform_component (v1) — the catalog entry with OCI URLs, versions, dependencies
namespace (v1) — namespace with labels and annotations
rolebinding (v1) — role binding with subjects

These schemas are the single source of truth for:

OpenAPI spec generation (openapi/openapi.yaml) — used for API documentation and client generation
Rust model generation (src/models/, src/apis/) — the structs the API service uses
CLI code generation (src/generated/cli/) — the CLI commands for each resource type

When a schema changes, make generate regenerates all downstream artifacts. This ensures the API, CLI, and documentation stay in sync with the resource definitions. See the Firestone documentation for the full schema language and generator options.

Benefits for Enterprise

Auditability

Every state change goes through the API. The API can log who changed what, when. Combined with Git history for templates, you have a full audit trail.

Consistency

The merge logic guarantees that every cluster gets a consistent, computed response. No hand-edited YAML files that drift.

Velocity

Operators can change cluster state in seconds. No PR cycles for operational changes. Reserve Git PRs for template/structural changes.

Testability

Because resources are structured data, you can:

Validate schemas before applying
Unit test merge logic
Integration test API responses against the ExternalService contract
Dry-run template rendering

Separation of Permissions

Template changes (how things deploy) require Git PR review
Data changes (what is deployed where) require API auth tokens
Reconciliation is automated — no human in the loop

API Reference

All endpoints return JSON. Flux-facing endpoints return the {"inputs": [...]} structure required by the ResourceSetInputProvider ExternalService contract. CRUD endpoints follow standard REST conventions.

Authentication

All endpoints require a Bearer token in the Authorization header.

Mode	Read Token	Write Token
`read-only`	`AUTH_TOKEN` env var	N/A (no write endpoints)
`crud`	`AUTH_TOKEN` env var	`CRUD_AUTH_TOKEN` env var (falls back to `AUTH_TOKEN`)

curl -H "Authorization: Bearer $AUTH_TOKEN" http://localhost:8080/health

Flux Read Endpoints

These endpoints are consumed by Flux Operator’s ResourceSetInputProvider. They follow the ExternalService contract.

ExternalService Contract

Every response must satisfy:

Top-level inputs array
Each item has a unique string id
Response body under 900 KiB
All JSON value types (strings, numbers, booleans, arrays, objects) are preserved in templates

`GET /api/v2/flux/clusters/{cluster_dns}/platform-components`

Returns platform components assigned to a cluster, with catalog defaults merged and per-cluster overrides applied.

Path parameters:

Parameter	Type	Description
`cluster_dns`	string	The cluster’s DNS name (e.g., `demo-cluster-01.k8s.example.com`)

Response:

{
  "inputs": [
    {
      "id": "cert-manager",
      "component_path": "cert-manager",
      "component_version": "latest",
      "cluster_env_enabled": false,
      "depends_on": [],
      "enabled": true,
      "patches": {},
      "cluster": {
        "name": "demo-cluster-01",
        "dns": "demo-cluster-01.k8s.example.com",
        "environment": "dev"
      },
      "source": {
        "oci_url": "https://charts.jetstack.io",
        "oci_tag": "latest"
      }
    }
  ]
}

Field reference:

Field	Type	Description
`id`	string	Unique component identifier, used as Flux resource name suffix
`component_path`	string	Chart name or path within OCI artifact. Cluster override takes precedence over catalog default
`component_version`	string	Upstream version. `"latest"` means no version pinning
`cluster_env_enabled`	boolean	If `true`, ResourceSet template appends `/{environment}` to the path
`depends_on`	string[]	Component IDs that must be healthy first. Empty = no dependencies
`enabled`	boolean	`false` causes Flux to garbage-collect the component
`patches`	object	Per-cluster key-value overrides, injected via HelmRelease `valuesFrom`
`cluster.name`	string	Cluster identifier
`cluster.dns`	string	Cluster FQDN
`cluster.environment`	string	Tier: `dev`, `qa`, `uat`, `prod`
`source.oci_url`	string	Helm repository or OCI registry URL
`source.oci_tag`	string	Chart/artifact version tag. Cluster override takes precedence

`GET /api/v2/flux/clusters/{cluster_dns}/namespaces`

Returns namespaces assigned to a cluster.

Response:

{
  "inputs": [
    {
      "id": "cert-manager",
      "labels": { "app": "cert-manager" },
      "annotations": {},
      "cluster": {
        "name": "demo-cluster-01",
        "dns": "demo-cluster-01.k8s.example.com",
        "environment": "dev"
      }
    }
  ]
}

`GET /api/v2/flux/clusters/{cluster_dns}/rolebindings`

Returns role bindings assigned to a cluster.

Response:

{
  "inputs": [
    {
      "id": "platform-admins",
      "role": "cluster-admin",
      "subjects": [
        {
          "kind": "Group",
          "name": "platform-team",
          "apiGroup": "rbac.authorization.k8s.io"
        }
      ],
      "cluster": {
        "name": "demo-cluster-01",
        "dns": "demo-cluster-01.k8s.example.com",
        "environment": "dev"
      }
    }
  ]
}

`GET /api/v2/flux/clusters`

Returns all clusters. Used by management cluster provisioners.

Response:

{
  "inputs": [
    {
      "id": "demo-cluster-01",
      "cluster_name": "demo-cluster-01",
      "cluster_dns": "demo-cluster-01.k8s.example.com",
      "environment": "dev"
    }
  ]
}

CRUD Endpoints

Available when API_MODE=crud. These follow standard REST patterns.

Clusters

Method	Path	Description
`GET`	`/clusters`	List all clusters
`POST`	`/clusters`	Create a cluster
`GET`	`/clusters/{id}`	Get cluster by ID
`PUT`	`/clusters/{id}`	Update a cluster
`DELETE`	`/clusters/{id}`	Delete a cluster

Cluster payload notes:

platform_components[] entries are references with per-cluster override fields (id, enabled, optional oci_tag, optional component_path).
namespaces[] entries are reference objects (id only).
rolebindings[] entries are reference objects (id only).

Platform Components

Method	Path	Description
`GET`	`/platform_components`	List all catalog components
`POST`	`/platform_components`	Create a catalog entry
`GET`	`/platform_components/{id}`	Get component by ID
`PUT`	`/platform_components/{id}`	Update a catalog entry
`DELETE`	`/platform_components/{id}`	Delete a catalog entry

Namespaces

Method	Path	Description
`GET`	`/namespaces`	List all namespace definitions
`POST`	`/namespaces`	Create a namespace definition
`GET`	`/namespaces/{id}`	Get namespace by ID
`PUT`	`/namespaces/{id}`	Update a namespace definition
`DELETE`	`/namespaces/{id}`	Delete a namespace definition

Rolebindings

Method	Path	Description
`GET`	`/rolebindings`	List all rolebinding definitions
`POST`	`/rolebindings`	Create a rolebinding definition
`GET`	`/rolebindings/{id}`	Get rolebinding by ID
`PUT`	`/rolebindings/{id}`	Update a rolebinding definition
`DELETE`	`/rolebindings/{id}`	Delete a rolebinding definition

Service Endpoints

Method	Path	Description
`GET`	`/health`	Liveness probe — returns `{"status": "ok"}`
`GET`	`/ready`	Readiness probe endpoint — currently returns `{"status": "ok"}`
`GET`	`/openapi.yaml`	OpenAPI 3.0 specification document

Error Responses

Status	Condition
`401 Unauthorized`	Missing or invalid bearer token
`404 Not Found`	Cluster DNS or resource ID not found
`500 Internal Server Error`	Data store connection error

Merge Logic

The merge logic is the critical path in the API. It takes raw cluster documents and catalog entries and produces the computed response that Flux consumes. Understanding the merge is key to understanding the entire system.

Platform Components Merge

This is the most complex merge. It combines three data sources into a single response:

flowchart TD
    A["Cluster Document"] --> D["Merge Logic"]
    B["Component Catalog"] --> D
    C["Cluster Patches"] --> D
    D --> E["Flux Response<br/>{inputs: [...]}"]

    A -.- A1["platform_components[]<br/>per-cluster overrides"]
    B -.- B1["Default oci_tag, component_path,<br/>oci_url, depends_on"]
    C -.- C1["patches[component_id]<br/>key-value overrides"]

Merge Rules

For each component in the cluster’s platform_components array:

Field	Source	Rule
`id`	Cluster component ref	Passed through
`enabled`	Cluster component ref	Passed through
`component_path`	Cluster override OR catalog default	Cluster override wins if non-null
`component_version`	Catalog	Always from catalog
`cluster_env_enabled`	Catalog	Always from catalog (template handles path appending)
`source.oci_url`	Catalog	Always from catalog
`source.oci_tag`	Cluster override OR catalog default	Cluster override wins if non-null
`depends_on`	Catalog	Always from catalog
`patches`	Cluster `patches[component_id]`	Empty `{}` if no patches for this component
`cluster.name`	Cluster doc	From cluster’s `cluster_name`
`cluster.dns`	Cluster doc	From cluster’s `cluster_dns`
`cluster.environment`	Cluster doc	From cluster’s `environment`

Merge Example

Given this cluster document:

{
  "cluster_name": "us-east-prod-01",
  "cluster_dns": "us-east-prod-01.k8s.example.com",
  "environment": "prod",
  "platform_components": [
    { "id": "cert-manager", "enabled": true, "oci_tag": null, "component_path": null },
    { "id": "grafana", "enabled": true, "oci_tag": "v1.0.0-1", "component_path": "observability/grafana/17.1.0" }
  ],
  "patches": {
    "grafana": { "GRAFANA_REPLICAS": "3" }
  }
}

And this catalog:

[
  { "_id": "cert-manager", "component_path": "core/cert-manager/1.14.0", "oci_url": "oci://registry/repo", "oci_tag": "v1.0.0", "depends_on": [] },
  { "_id": "grafana", "component_path": "observability/grafana/17.0.0", "oci_url": "oci://registry/repo", "oci_tag": "v1.0.0", "depends_on": ["cert-manager"] }
]

The merge produces:

{
  "inputs": [
    {
      "id": "cert-manager",
      "component_path": "core/cert-manager/1.14.0",
      "source": { "oci_url": "oci://registry/repo", "oci_tag": "v1.0.0" },
      "patches": {},
      "cluster": { "name": "us-east-prod-01", "dns": "us-east-prod-01.k8s.example.com", "environment": "prod" }
    },
    {
      "id": "grafana",
      "component_path": "observability/grafana/17.1.0",
      "source": { "oci_url": "oci://registry/repo", "oci_tag": "v1.0.0-1" },
      "depends_on": ["cert-manager"],
      "patches": { "GRAFANA_REPLICAS": "3" },
      "cluster": { "name": "us-east-prod-01", "dns": "us-east-prod-01.k8s.example.com", "environment": "prod" }
    }
  ]
}

Notice:

cert-manager uses catalog defaults for everything (cluster overrides are null)
grafana uses cluster override for oci_tag (v1.0.0-1) and component_path (observability/grafana/17.1.0)
grafana gets the per-cluster patch (GRAFANA_REPLICAS: "3")

Namespaces Merge

Namespaces now use a reference + lookup model:

flowchart TD
    A["Cluster Document"] --> C["Merge Logic"]
    B["Namespace Definitions"] --> C
    C --> D["Flux Response"]

    A -.- A1["namespaces[]<br/>id references"]
    B -.- B1["id, labels, annotations"]
    D -.- D1["Each namespace gets<br/>cluster block nested in"]

Merge steps:

Read cluster.namespaces[] as ID references.
Resolve each ID from the namespace definitions store.
Return resolved namespace payload + nested cluster block (name, dns, environment).
Any missing referenced IDs are skipped in Flux response generation.

Rolebindings Merge

Rolebindings follow the same pattern as namespaces:

Read cluster.rolebindings[] as ID references.
Resolve each ID from the rolebinding definitions store.
Return resolved rolebinding payload (id, role, subjects[]) + nested cluster block.
Any missing referenced IDs are skipped in Flux response generation.

Why Merge Matters

The merge logic is what makes this system more than a simple proxy. It enables:

Catalog defaults — define a component once, inherit everywhere
Per-cluster overrides — pin a specific cluster to a hotfix version without affecting others
Per-cluster patches — inject environment-specific values without touching the component definition
Computed responses — the cluster gets exactly the state it needs, computed from multiple data sources

Without the merge, you would need to duplicate the full component definition per cluster — which is exactly the problem this architecture solves.

Configuration & Deployment

Environment Variables

Variable	Required	Default	Description
`API_MODE`	no	`read-only`	Runtime mode: `read-only` or `crud`
`STORE_BACKEND`	no	`sqlite`	Data backend: `sqlite` or `memory`
`DATABASE_URL`	no	`sqlite://data/flux-resourceset.db?mode=rwc`	SQLite DSN when `STORE_BACKEND=sqlite`
`AUTH_TOKEN`	yes	—	Bearer token for read routes
`CRUD_AUTH_TOKEN`	no	`AUTH_TOKEN`	Bearer token for write routes in CRUD mode
`SEED_FILE`	no	`data/seed.json`	Seed data file loaded at startup
`OPENAPI_FILE`	no	`openapi/openapi.yaml`	OpenAPI document served at `/openapi.yaml`
`LISTEN_ADDR`	no	`0.0.0.0:8080`	Bind address
`RUST_LOG`	no	unset	Tracing filter directive

Runtime Modes

read-only

The default mode. Serves only Flux read endpoints (/api/v2/flux/...) and service endpoints (/health, /ready, /openapi.yaml). Designed for high-concurrency polling from many clusters.

API_MODE=read-only AUTH_TOKEN=my-token cargo run

crud

Full CRUD mode. Includes all read endpoints plus REST endpoints for clusters, platform_components, namespaces, and rolebindings. Used by operators and CI/CD pipelines.

API_MODE=crud AUTH_TOKEN=read-token CRUD_AUTH_TOKEN=write-token cargo run

Production Deployment

Kubernetes Deployment (read-only)

apiVersion: apps/v1
kind: Deployment
metadata:
  name: flux-api-read
  namespace: flux-system
spec:
  replicas: 2
  selector:
    matchLabels:
      app: flux-api-read
  template:
    metadata:
      labels:
        app: flux-api-read
    spec:
      containers:
        - name: flux-api
          image: flux-resourceset:latest
          ports:
            - containerPort: 8080
          env:
            - name: API_MODE
              value: "read-only"
            - name: STORE_BACKEND
              value: "sqlite"
            - name: DATABASE_URL
              value: "sqlite:///var/lib/flux-resourceset/flux-resourceset.db?mode=rwc"
            - name: SEED_FILE
              value: "/seed/seed.json"
            - name: AUTH_TOKEN
              valueFrom:
                secretKeyRef:
                  name: internal-api-token
                  key: token
            - name: RUST_LOG
              value: "info"
          resources:
            requests:
              cpu: 50m
              memory: 32Mi
            limits:
              cpu: 200m
              memory: 64Mi
          livenessProbe:
            httpGet:
              path: /health
              port: 8080
            initialDelaySeconds: 2
            periodSeconds: 10
          readinessProbe:
            httpGet:
              path: /health
              port: 8080
            initialDelaySeconds: 2
            periodSeconds: 5

Resource requests are deliberately small — Rust’s efficiency means this service uses minimal resources. Run 2+ replicas for high availability, not for throughput.

Performance Characteristics

Each request does a data store lookup and a merge. Expected latency is sub-millisecond for the in-memory backend and typically single-digit milliseconds for SQLite on local SSD.

Clusters	Poll Interval	Requests/sec
50	5 min	0.17
200	5 min	0.67
1,000	5 min	3.3
5,000	5 min	16.7

Even at 5,000 clusters with three resource types each, the load is ~50 req/sec — trivial for a Rust/axum service.

Build Commands

cargo build                    # Build API + CLI
cargo build --bin flux-resourceset-cli  # Build CLI only
cargo test                     # Run all tests
cargo clippy -- -D warnings    # Lint
cargo fmt                      # Format

Docker

make docker-build              # Build container image

Code Generation

The project uses Firestone for schema-driven code generation:

make generate

This regenerates:

openapi/openapi.yaml — OpenAPI 3.0 spec
src/models/ — Rust model structs
src/apis/ — Rust API client modules
src/generated/cli/ — CLI command modules

ResourceSet Templates

ResourceSet templates are the bridge between API data and Kubernetes resources. They use the Flux Operator’s templating engine to render manifests from the {"inputs": [...]} response.

Upstream reference: See the full ResourceSet CRD documentation for all available spec fields, status conditions, and advanced features like inventory tracking and garbage collection.

Template Syntax

ResourceSet uses << and >> as delimiters (not {{/}}). This avoids conflicts with Helm templates and Go templates in the rendered YAML.

Key template functions:

<< inputs.field >> — access input fields
<< inputs.nested.field >> — access nested objects
<< inputs.field | slugify >> — slugify a string for use in Kubernetes names
<<- range $k, $v := inputs.object >> — iterate over object keys
<<- range $item := inputs.array >> — iterate over arrays
<<- if inputs.field >> — conditional rendering
<<- if ne inputs.field "value" >> — conditional with comparison
<< inputs.object | toYaml | nindent N >> — convert to YAML with indentation

Platform Components Template

This is the most complex template. For each component input, it renders up to three resources:

flowchart TD
    I["Input from API<br/>(one per component)"] --> CM{"Has patches?"}
    CM -->|yes| ConfigMap["ConfigMap<br/>values-{id}-{cluster}"]
    CM -->|no| Skip["Skip ConfigMap"]
    I --> HR["HelmRepository<br/>charts-{id}"]
    I --> HRL["HelmRelease<br/>platform-{id}"]
    ConfigMap -.->|"valuesFrom"| HRL
    HR -.->|"sourceRef"| HRL

Full Template

apiVersion: fluxcd.controlplane.io/v1
kind: ResourceSet
metadata:
  name: platform-components
  namespace: flux-system
spec:
  inputsFrom:
    - name: platform-components
  resourcesTemplate: |
    <<- if inputs.enabled >>
    <<- if inputs.patches >>
    ---
    apiVersion: v1
    kind: ConfigMap
    metadata:
      name: values-<< inputs.id | slugify >>-<< inputs.cluster.name | slugify >>
      namespace: flux-system
    data:
      <<- range $key, $value := inputs.patches >>
      << $key >>: "<< $value >>"
      <<- end >>
    <<- end >>
    ---
    apiVersion: source.toolkit.fluxcd.io/v1
    kind: HelmRepository
    metadata:
      name: charts-<< inputs.id | slugify >>
      namespace: flux-system
    spec:
      interval: 30m
      url: "<< inputs.source.oci_url >>"
    ---
    apiVersion: helm.toolkit.fluxcd.io/v2
    kind: HelmRelease
    metadata:
      name: platform-<< inputs.id >>
      namespace: flux-system
    spec:
      interval: 10m
      releaseName: << inputs.id | slugify >>
      targetNamespace: << inputs.id | slugify >>
      install:
        remediation:
          retries: 3
      upgrade:
        remediation:
          retries: 3
      chart:
        spec:
          chart: << inputs.component_path >>
          sourceRef:
            kind: HelmRepository
            name: charts-<< inputs.id | slugify >>
            namespace: flux-system
          interval: 10m
          <<- if ne inputs.component_version "latest" >>
          version: "<< inputs.component_version >>"
          <<- end >>
      <<- if inputs.depends_on >>
      dependsOn:
        <<- range $dep := inputs.depends_on >>
        - name: platform-<< $dep >>
        <<- end >>
        <<- end >>
      <<- if inputs.patches >>
      valuesFrom:
        <<- range $key, $_ := inputs.patches >>
        - kind: ConfigMap
          name: values-<< inputs.id | slugify >>-<< inputs.cluster.name | slugify >>
          valuesKey: << $key >>
          targetPath: << $key >>
        <<- end >>
      <<- end >>
    <<- end >>

What Each Section Does

Enabled check (<<- if inputs.enabled >>) — If the component is disabled, nothing is rendered. Flux garbage-collects previously rendered resources.

ConfigMap for patches — If the component has patches, a ConfigMap is created with the key-value pairs. The HelmRelease references this ConfigMap via valuesFrom, which maps each key to a Helm value path using targetPath.

HelmRepository — Points to the chart repository URL from inputs.source.oci_url.

HelmRelease — The core resource. Key behaviors:

chart references the HelmRepository and uses inputs.component_path as the chart name
version is only set if component_version is not "latest"
dependsOn creates ordering dependencies between components
valuesFrom injects per-cluster patches from the ConfigMap

Namespaces Template

Renders a Kubernetes Namespace for each input:

apiVersion: fluxcd.controlplane.io/v1
kind: ResourceSet
metadata:
  name: namespaces
  namespace: flux-system
spec:
  inputsFrom:
    - name: namespaces
  resourcesTemplate: |
    ---
    apiVersion: v1
    kind: Namespace
    metadata:
      name: << inputs.id >>
      labels:
        <<- range $k, $v := inputs.labels >>
        << $k >>: "<< $v >>"
        <<- end >>
      annotations:
        <<- range $k, $v := inputs.annotations >>
        << $k >>: "<< $v >>"
        <<- end >>

Labels and annotations from the API response are dynamically rendered using range.

Rolebindings Template

Renders a ClusterRoleBinding for each input:

apiVersion: fluxcd.controlplane.io/v1
kind: ResourceSet
metadata:
  name: rolebindings
  namespace: flux-system
spec:
  inputsFrom:
    - name: rolebindings
  resourcesTemplate: |
    ---
    apiVersion: rbac.authorization.k8s.io/v1
    kind: ClusterRoleBinding
    metadata:
      name: << inputs.id >>
    roleRef:
      apiGroup: rbac.authorization.k8s.io
      kind: ClusterRole
      name: << inputs.role >>
    subjects:
      <<- range $s := inputs.subjects >>
      - kind: << $s.kind >>
        name: << $s.name >>
        apiGroup: << $s.apiGroup >>
      <<- end >>

Template Design Principles

One ResourceSet per resource type — keeps templates focused and failures isolated
Conditional rendering — use if blocks to skip disabled components or optional fields
Slugify names — Kubernetes resource names must be DNS-compatible; slugify handles this
Garbage collection — when an input disappears from the API response, Flux removes the resources that ResourceSet previously created
No cluster-specific logic in templates — all cluster differentiation comes from the API data, not from template conditionals

ResourceSetInputProvider

The ResourceSetInputProvider is the Flux Operator CRD that tells a ResourceSet where to fetch its input data. In this architecture, every provider uses type: ExternalService to call the flux-resourceset API.

Upstream reference: See the full ResourceSetInputProvider CRD documentation for all supported input types, authentication options, and status conditions.

How Providers Work

flowchart LR
    subgraph "flux-system namespace"
        P["ResourceSetInputProvider<br/>type: ExternalService"]
        S["Secret<br/>internal-api-token"]
        RS["ResourceSet"]
    end

    API["flux-resourceset API"]

    P -->|"GET (with bearer token)"| API
    S -.->|"secretRef"| P
    P -->|"provides inputs"| RS
    RS -->|"renders resources"| K8s["Kubernetes Resources"]

Provider Configuration

Each provider specifies:

type — ExternalService (calls an HTTP API)
url — the endpoint to call
secretRef — Kubernetes Secret containing the bearer token
reconcileEvery — how often to poll (annotation)

Platform Components Provider

apiVersion: fluxcd.controlplane.io/v1
kind: ResourceSetInputProvider
metadata:
  name: platform-components
  namespace: flux-system
  annotations:
    fluxcd.controlplane.io/reconcileEvery: "30s"
spec:
  type: ExternalService
  url: http://flux-api-read.flux-system.svc.cluster.local:8080/api/v2/flux/clusters/demo-cluster-01.k8s.example.com/platform-components
  insecure: true
  secretRef:
    name: internal-api-token

Namespaces Provider

apiVersion: fluxcd.controlplane.io/v1
kind: ResourceSetInputProvider
metadata:
  name: namespaces
  namespace: flux-system
  annotations:
    fluxcd.controlplane.io/reconcileEvery: "30s"
spec:
  type: ExternalService
  url: http://flux-api-read.flux-system.svc.cluster.local:8080/api/v2/flux/clusters/demo-cluster-01.k8s.example.com/namespaces
  insecure: true
  secretRef:
    name: internal-api-token

Rolebindings Provider

Same pattern with /rolebindings endpoint.

URL Construction

In production, the provider URL uses variable substitution from the cluster-identity ConfigMap:

url: "${INTERNAL_API_URL}/api/v2/flux/clusters/${CLUSTER_DNS}/platform-components"

This means the same provider manifest works on every cluster — only the ConfigMap values differ.

In the demo, the URL is hardcoded to the in-cluster service address and a demo cluster DNS.

Authentication

The provider references a Secret that contains the bearer token:

apiVersion: v1
kind: Secret
metadata:
  name: internal-api-token
  namespace: flux-system
type: Opaque
stringData:
  token: "your-bearer-token-here"

The Flux Operator sends this as Authorization: Bearer <token> on every request.

For production, consider:

Token rotation — update the Secret, Flux picks up the new token on next request
mTLS — ResourceSetInputProvider supports certSecretRef for TLS client certificates

Reconciliation Behavior

Event	Provider Behavior
Scheduled interval	Provider calls the API, ResourceSet re-renders if inputs changed
API returns same data	No change — ResourceSet does not re-render
API returns new data	ResourceSet re-renders, Flux applies the diff
API returns error	Provider goes not-ready, existing resources continue running
API unreachable	Same as error — graceful degradation
Manual trigger	Annotate with `fluxcd.controlplane.io/requestedAt` to force immediate reconcile

Forcing Immediate Reconciliation

kubectl annotate resourcesetinputprovider platform-components -n flux-system \
  fluxcd.controlplane.io/requestedAt="$(date -u +"%Y-%m-%dT%H:%M:%SZ")" --overwrite

Observing Provider Status

# Check provider status
kubectl get resourcesetinputproviders -n flux-system

# Detailed status with conditions
kubectl describe resourcesetinputprovider platform-components -n flux-system

# Check ResourceSet status
kubectl get resourcesets -n flux-system

Dynamic Patching

Dynamic patching is one of the most powerful features of this architecture. It allows per-cluster, per-component value overrides without modifying Git, the component catalog, or any template. Operators can change Helm values, replica counts, feature flags, and more — and Flux reconciles the change automatically.

How Patching Works

sequenceDiagram
    participant Op as Operator
    participant API as flux-resourceset API
    participant DB as Data Store
    participant Flux as Child Cluster (Flux)

    Op->>API: PATCH cluster "us-east-prod-01"<br/>patches.grafana.replicaCount = "3"
    API->>DB: Update cluster document
    API-->>Op: 200 OK

    Note over Flux: Next poll cycle

    Flux->>API: GET /clusters/{dns}/platform-components
    API->>DB: Read cluster + catalog
    API->>API: Merge: inject patches.grafana into grafana input
    API-->>Flux: {"inputs": [{..., "patches": {"replicaCount": "3"}}]}

    Flux->>Flux: ResourceSet renders ConfigMap with replicaCount=3
    Flux->>Flux: HelmRelease references ConfigMap via valuesFrom
    Flux->>Flux: Helm upgrade applies new replica count

The Patches Object

Patches are stored in the cluster document, keyed by component ID:

{
  "cluster_dns": "us-east-prod-01.k8s.example.com",
  "patches": {
    "grafana": {
      "replicaCount": "3",
      "persistence.storageClassName": "ssd"
    },
    "podinfo": {
      "replicaCount": "2",
      "ui.color": "#2f855a",
      "ui.message": "Hello from patches"
    },
    "traefik": {
      "deployment.replicas": "1",
      "service.type": "ClusterIP"
    }
  }
}

Each key in a component’s patches maps to a Helm value path. Dotted keys (like ui.color) map to nested Helm values.

How Patches Become Helm Values

The ResourceSet template renders patches into a ConfigMap, then references it from the HelmRelease via valuesFrom:

flowchart TD
    A["API Response<br/>patches: {replicaCount: '2', ui.color: '#2f855a'}"]
    B["ConfigMap<br/>values-podinfo-cluster"]
    C["HelmRelease<br/>platform-podinfo"]
    D["Helm Chart<br/>podinfo"]

    A -->|"ResourceSet renders"| B
    B -->|"valuesFrom with targetPath"| C
    C -->|"helm upgrade"| D

    B -.- B1["data:<br/>  replicaCount: '2'<br/>  ui.color: '#2f855a'"]
    C -.- C1["valuesFrom:<br/>  - kind: ConfigMap<br/>    valuesKey: replicaCount<br/>    targetPath: replicaCount<br/>  - kind: ConfigMap<br/>    valuesKey: ui.color<br/>    targetPath: ui.color"]

The targetPath in valuesFrom tells Helm where to inject the value in the chart’s values tree. This is a standard Flux HelmRelease feature — the innovation is that the values are computed from the API, not hardcoded in Git.

In the demo template, each generated values ConfigMap is labeled reconcile.fluxcd.io/watch: "Enabled" and each generated HelmRelease uses interval: 1m. This gives fast event-driven upgrades when values change, plus a short periodic poll interval.

Patching via CLI

The demo includes a CLI command to patch any component with dynamic key=value paths:

# Patch podinfo values on demo-cluster-01
./target/debug/flux-resourceset-cli demo patch-component demo-cluster-01 podinfo \
  --set replicaCount=3 \
  --set ui.message="Hello from CLI patch" \
  --set ui.color="#3b82f6"

This updates the cluster document’s patches.podinfo object in the data store.

Patching Use Cases

Use Case	Patch Example	Effect
Scale a component	`{"replicaCount": "3"}`	Component scales to 3 replicas
Change UI branding	`{"ui.color": "#ff0000", "ui.message": "Maintenance"}`	Application UI reflects new values
Environment-specific tuning	`{"resources.limits.memory": "512Mi"}`	Different resource limits per cluster
Feature flags	`{"feature.newDashboard": "true"}`	Enable features per cluster
Ingress configuration	`{"ingress.className": "internal"}`	Different ingress class per cluster

Patching vs. Other Override Mechanisms

Mechanism	Scope	Requires Git PR?	Use Case
Catalog defaults	All clusters using the component	Yes (schema change)	Global default values
OCI tag override	One cluster, one component	No (API call)	Hotfix or canary version
Component path override	One cluster, one component	No (API call)	Component version upgrade
Patches	One cluster, one component	No (API call)	Value tuning, feature flags, scaling
Template changes	All clusters (template is global)	Yes (Git PR)	Changing how resources are rendered

Patches are the most granular override — they change individual Helm values without affecting any other cluster or component.

Verifying Patches

After patching, verify the change propagated:

# Reconcile quickly
flux reconcile helmrelease platform-podinfo -n flux-system --with-source

# Check reconcile result
kubectl get hr -n flux-system platform-podinfo \
  -o jsonpath='ready={.status.conditions[?(@.type=="Ready")].status} reason={.status.conditions[?(@.type=="Ready")].reason} action={.status.lastAttemptedReleaseAction}{"\n"}'

# Check the actual deployment
kubectl get deploy -n podinfo podinfo \
  -o jsonpath='replicas={.spec.replicas} color={.spec.template.spec.containers[0].env[?(@.name=="PODINFO_UI_COLOR")].value} message={.spec.template.spec.containers[0].env[?(@.name=="PODINFO_UI_MESSAGE")].value}{"\n"}'

# Check rendered values
kubectl get configmap -n flux-system values-podinfo-demo-cluster-01 \
  -o jsonpath='replicas={.data.replicaCount} color={.data.ui\.color} message={.data.ui\.message}{"\n"}'

Multi-Cluster Management

This architecture is designed from the ground up for managing hundreds to thousands of Kubernetes clusters. The phone-home model, stateless API, and resource-driven data model all contribute to linear scaling without operational complexity growth.

Scaling Properties

graph TB
    subgraph "Enterprise Fleet"
        direction TB
        DEV1["DEV Cluster 1"]
        DEV2["DEV Cluster 2"]
        DEV3["DEV Cluster ...N"]
        QA1["QA Cluster 1"]
        QA2["QA Cluster 2"]
        UAT1["UAT Cluster 1"]
        PROD1["PROD Cluster 1"]
        PROD2["PROD Cluster 2"]
        PROD3["PROD Cluster ...N"]
    end

    API["flux-resourceset API<br/>(stateless, multi-replica)"]

    DEV1 & DEV2 & DEV3 -->|"poll"| API
    QA1 & QA2 -->|"poll"| API
    UAT1 -->|"poll"| API
    PROD1 & PROD2 & PROD3 -->|"poll"| API

Why It Scales

Property	How
Stateless API	No per-cluster state in the API process. Add replicas for HA, not for capacity.
Pull-based	Each cluster owns its own reconciliation loop. The API does not need to track cluster connectivity.
Minimal request cost	Each request = 1 data store read + 1 merge. Sub-millisecond response time.
Independent failures	One cluster’s provider failing does not affect any other cluster.
Linear polling load	1,000 clusters polling 3 endpoints every 5 minutes = 10 req/sec. Trivial for any HTTP service.

Fleet-Wide Operations

Rolling Out a New Component

When a new platform component needs to be deployed across the fleet:

flowchart TD
    A["1. Add to component catalog<br/>(one API call)"] --> B["2. Add component_ref to<br/>target clusters<br/>(batch API calls)"]
    B --> C["3. Clusters poll on schedule"]
    C --> D["4. Each cluster independently<br/>installs the component"]

    B -->|"DEV first"| C1["DEV clusters pick up change"]
    B -->|"then QA"| C2["QA clusters pick up change"]
    B -->|"then PROD"| C3["PROD clusters pick up change"]

You control rollout speed by controlling when you add the component_ref to each tier’s clusters. No pipeline orchestration — just API calls.

Upgrading a Component Version

To upgrade grafana from 17.0.0 to 17.1.0 across the fleet:

Ensure the new version exists in the platform components OCI artifact
Update the catalog’s component_path from observability/grafana/17.0.0 to observability/grafana/17.1.0
All clusters using catalog defaults pick up the change on next poll

For canary rollouts, override specific clusters first:

{
  "platform_components": [
    {
      "id": "grafana",
      "component_path": "observability/grafana/17.1.0",
      "oci_tag": "v1.1.0-rc1"
    }
  ]
}

DEV gets the new version. PROD stays on the catalog default.

Hotfix Workflow

flowchart LR
    A["CVE discovered<br/>in cert-manager"] --> B["Fix merged to repo<br/>OCI artifact v1.0.0-1 built"]
    B --> C["Update affected clusters<br/>oci_tag: v1.0.0-1"]
    C --> D["Clusters poll and<br/>reconcile the fix"]

    C -->|"Only cert-manager<br/>is affected"| E["All other components<br/>stay on v1.0.0"]

Hotfixes are per-component, per-cluster. You update oci_tag on the specific component for the specific clusters that need the fix. No full release cycle required.

Environment Tiers

The architecture has first-class support for environment-based differentiation:

Mechanism	How It Works
`cluster.environment`	Each cluster document has an `environment` field (`dev`, `qa`, `uat`, `prod`). Included in every API response.
`cluster_env_enabled`	When `true` on a catalog component, the ResourceSet template appends `/{environment}` to the component path. Different environment tiers get different Kustomize overlays.
Per-cluster patches	Different Helm values per cluster. PROD gets 5 replicas, DEV gets 1.
OCI tag overrides	DEV clusters can pin to release candidates while PROD stays on stable.

Environment-Aware Path Resolution

When cluster_env_enabled is true:

Catalog component_path: core/cert-manager/1.14.0
Cluster environment: prod
→ Resolved path: core/cert-manager/1.14.0/prod

This enables the platform components repo to have environment-specific Kustomize overlays:

core/cert-manager/1.14.0/
├── base/
│   └── deployment.yaml
├── dev/
│   └── kustomization.yaml
├── qa/
│   └── kustomization.yaml
└── prod/
    └── kustomization.yaml

Decommissioning a Cluster

flowchart LR
    A["Delete cluster record<br/>from API"] --> B["All endpoints return<br/>empty inputs"]
    B --> C["ResourceSets render<br/>empty resource list"]
    C --> D["Flux garbage-collects<br/>all resources"]

No manual cleanup. No orphaned resources. The data model drives everything.

Enterprise Benefits Summary

Benefit	Description
Single source of truth	One API holds the desired state for every cluster. No separate configuration management inventory, no spreadsheets, no wiki pages.
Cluster creation in minutes	Bootstrap cluster + phone home + reconcile. No weeks-long process involving manual playbooks and ticket queues.
Zero state divergence	API data = ResourceSet input = running cluster state. Drift is automatically corrected.
Operational velocity	Change a value via API → Flux reconciles. No PR, no review, no pipeline for operational changes.
Audit trail	Every API mutation is logged. Templates changes go through Git. Full traceability.
Team autonomy	Platform engineers own templates (Git). Platform operators own data (API). Flux owns reconciliation.
Failure isolation	Each cluster is independent. API outage = no new changes, not cluster outage.
Cost efficiency	Stateless API uses minimal resources. No management cluster scaling with fleet size.
Infrastructure-agnostic	Same model works on-prem, in the cloud, at the edge, or across hybrid environments. No vendor lock-in.

Versioning & Hotfix Strategy

The platform components repo uses a versioning model with two independent axes of change: the OCI artifact tag and the component path within that artifact. This enables fine-grained control over what each cluster runs.

Two Axes of Version Control

graph TD
    subgraph "OCI Artifact (tagged build of the repo)"
        subgraph "v1.0.0"
            A1["core/cert-manager/1.14.0/"]
            A2["observability/grafana/17.0.0/"]
            A3["networking/ingress-nginx/4.9.0/"]
        end
    end

    subgraph "OCI Artifact (hotfix build)"
        subgraph "v1.0.0-1"
            B1["core/cert-manager/1.14.1/ ← fixed"]
            B2["observability/grafana/17.0.0/"]
            B3["networking/ingress-nginx/4.9.0/"]
        end
    end

Axis	What It Controls	How It Changes
OCI tag	Which build of the monorepo artifact to pull	New tag on each merge to main (`v1.0.0`, `v1.0.0-1`, `v1.1.0`)
Component path	Which version directory within the artifact to use	Update `component_path` in the API (`observability/grafana/17.0.0` → `17.1.0`)

Normal Release Flow

flowchart LR
    A["All components at<br/>v1.0.0"] --> B["New feature merged<br/>to platform repo"]
    B --> C["CI builds and tags<br/>v1.1.0"]
    C --> D["Update catalog<br/>oci_tag: v1.1.0"]
    D --> E["All clusters pull<br/>v1.1.0 on next poll"]

In a normal release, all components point to the same OCI tag. The catalog default is updated, and every cluster picks it up.

Hotfix Flow

flowchart LR
    A["CVE in cert-manager<br/>All clusters on v1.0.0"] --> B["Fix merged to<br/>cert-manager/1.14.1/"]
    B --> C["CI builds and tags<br/>v1.0.0-1"]
    C --> D["Update cert-manager's<br/>oci_tag to v1.0.0-1<br/>component_path to<br/>cert-manager/1.14.1"]
    D --> E["Only cert-manager<br/>updates on affected clusters"]
    E --> F["All other components<br/>stay on v1.0.0"]

Hotfixes use SemVer pre-release suffixes: v1.0.0-1, v1.0.0-2. This keeps them:

Sortable — v1.0.0-1 < v1.0.0-2 < v1.1.0
Tied to base release — clear which release they patch
Temporary — the next full release collapses everything back to one tag

Per-Cluster Version Pinning

Any cluster can be pinned to a different version than the catalog default:

{
  "platform_components": [
    {
      "id": "grafana",
      "oci_tag": "v1.1.0-rc1",
      "component_path": "observability/grafana/17.1.0"
    }
  ]
}

Use cases:

Canary testing — DEV cluster gets the release candidate
Rollback — pin a PROD cluster to the previous version while investigating
Gradual rollout — update clusters one tier at a time

Component Lifecycle

stateDiagram-v2
    [*] --> CatalogEntry: Add to catalog
    CatalogEntry --> AssignedToCluster: Add component_ref to cluster
    AssignedToCluster --> Running: Flux reconciles
    Running --> Upgraded: Update component_path/oci_tag
    Upgraded --> Running: Flux reconciles new version
    Running --> Hotfixed: Per-cluster oci_tag override
    Hotfixed --> Running: Next full release
    Running --> Disabled: Set enabled=false
    Disabled --> Running: Set enabled=true
    Running --> Removed: Remove component_ref
    Removed --> GarbageCollected: Flux cleans up
    GarbageCollected --> [*]

Platform Components Repo Structure

appteam-flux-repo/
├── COMPONENTS.yaml              # Registry — CI-validated
├── core/
│   └── cert-manager/
│       ├── 1.14.0/
│       │   ├── base/            # Shared resources
│       │   ├── dev/
│       │   │   └── kustomization.yaml
│       │   ├── qa/
│       │   │   └── kustomization.yaml
│       │   └── prod/
│       │       └── kustomization.yaml
│       └── 1.14.1/              # Hotfix version
│           └── ...
├── observability/
│   └── grafana/
│       ├── 17.0.0/
│       │   └── ...
│       └── 17.1.0/              # Upgrade version
│           └── ...
└── networking/
    └── ingress-nginx/
        └── 4.9.0/
            └── ...

Each environment directory must be buildable in isolation: kustomize build core/cert-manager/1.14.0/prod/ must succeed.

Version Cleanup

Keep N previous versions per component (recommended: 3). CI can prune older version directories. Old OCI tags remain in the registry for emergency rollbacks.

Security & Authentication

Security in this architecture operates at multiple layers: API authentication, cluster identity, network boundaries, and credential management.

Authentication Model

flowchart TD
    subgraph "Child Cluster"
        S["Secret: internal-api-token"]
        P["ResourceSetInputProvider"]
        S -->|"Bearer token"| P
    end

    subgraph "API Layer"
        AUTH["Auth Middleware"]
        API["flux-resourceset"]
        AUTH -->|"validated"| API
    end

    P -->|"Authorization: Bearer <token>"| AUTH

Bearer Token Authentication

The API uses bearer token authentication. Tokens are configured via environment variables:

AUTH_TOKEN — required for all read endpoints
CRUD_AUTH_TOKEN — required for write endpoints in CRUD mode (defaults to AUTH_TOKEN if not set)

This separation allows:

Read-only clusters to use a shared read token
Operators/CI to use a separate write token
Token rotation without affecting cluster polling (rotate read and write tokens independently)

Cluster-Side Token Storage

Each cluster stores the token in a Kubernetes Secret:

apiVersion: v1
kind: Secret
metadata:
  name: internal-api-token
  namespace: flux-system
type: Opaque
stringData:
  token: "the-bearer-token"

This Secret is either:

Pre-installed in the cluster’s bootstrap image or manifests
Injected during cluster provisioning (via cloud-init, Terraform, Cluster API, or manual setup)
Managed by an external secrets operator that fetches the token from a vault

Upgrading to mTLS

For stricter security requirements, the ResourceSetInputProvider supports TLS client certificates via certSecretRef:

spec:
  type: ExternalService
  url: https://internal-api.internal.example.com/api/v2/flux/...
  certSecretRef:
    name: api-client-cert

This eliminates shared bearer tokens in favor of per-cluster x.509 certificates. The API would need to be configured with a TLS server certificate and a CA trust chain.

Network Security

Recommended Network Boundaries

Connection	Direction	Protocol	Authentication
Cluster → API	Outbound from cluster	HTTPS	Bearer token or mTLS
Operator → API (CRUD)	Inbound to CRUD instance	HTTPS	Bearer token (write)
API → Data Store	Local/internal	SQLite file access (or in-memory)	Filesystem permissions

Network Policy Considerations

The API does not need inbound access to clusters — it is purely pull-based
Only the flux-system namespace on each cluster needs outbound access to the API
CRUD endpoints should be restricted to operator networks or CI/CD runners

Cluster Identity

The cluster-identity ConfigMap is the root of trust for each cluster:

data:
  CLUSTER_NAME: "us-east-prod-01"
  CLUSTER_DNS: "us-east-prod-01.k8s.internal.example.com"
  ENVIRONMENT: "prod"
  INTERNAL_API_URL: "https://internal-api.internal.example.com"

This ConfigMap determines:

Which API endpoint the cluster calls
Which cluster DNS is used in the URL path (determines what data the cluster receives)
What environment tier the cluster belongs to

The ConfigMap is injected during cluster provisioning and should be treated as immutable after bootstrap.

Data Access Control

The API enforces access control at the endpoint level:

Endpoint	Token Required	Access Level
`/api/v2/flux/...`	`AUTH_TOKEN`	Read-only — clusters can only read their own data via DNS path
`/clusters`, `/platform_components`, etc.	`CRUD_AUTH_TOKEN`	Read-write — operators can modify any cluster
`/health`, `/ready`	None	Public — Kubernetes probes
`/openapi.yaml`	None	Public — API documentation

Per-Cluster Data Isolation

Each cluster can only access its own data because the API path includes the cluster DNS:

GET /api/v2/flux/clusters/us-east-prod-01.k8s.example.com/platform-components

A cluster cannot query another cluster’s configuration without knowing (and requesting) a different DNS path. The bearer token does not provide cross-cluster access control — all clusters share the same read token. If per-cluster token isolation is required, implement it as an API middleware enhancement.

Secrets in the Data Model

The patches object supports arbitrary key-value pairs. Do not store sensitive values (passwords, API keys, private certificates) in patches. Instead:

Use Kubernetes Secrets + ExternalSecrets Operator for sensitive values
Use patches only for non-sensitive configuration (replica counts, feature flags, resource limits)
For sensitive Helm values, use valuesFrom with a Secret instead of a ConfigMap

Local Demo

This guide walks through running the full demo on a local kind cluster. By the end, you will have:

A kind cluster with Flux Operator installed
The flux-resourceset API deployed with seed data
ResourceSetInputProviders polling the API
ResourceSets rendering and reconciling platform components, namespaces, and rolebindings

Prerequisites

Required tools:

Rust/Cargo — build the API and CLI
Docker — container runtime for kind
kind — local Kubernetes clusters
kubectl — Kubernetes CLI
flux CLI — manual reconcile commands (flux reconcile ...)
curl — HTTP requests

Optional tools:

jq — pretty JSON output
Poetry + Python 3 — for make generate (code generation only)
openapi-generator — for Rust model generation (code generation only)

One-Command Demo

cd flux-resourceset
make demo

This runs kind-create and kind-demo, which:

Builds the Docker image (flux-resourceset:local)
Creates a kind cluster named flux-demo
Loads the image into the cluster
Installs the Flux Operator from upstream
Applies base Kubernetes manifests (FluxInstance, RBAC, services)
Waits for Flux controllers to be ready
Creates a seed data ConfigMap from data/seed.json
Deploys the API (read-only + CRUD instances)
Applies ResourceSetInputProviders
Applies ResourceSets

What Gets Deployed

graph TB
    subgraph "flux-system namespace"
        API_R["flux-api-read<br/>(read-only mode)"]
        API_C["flux-api-crud<br/>(CRUD mode)"]
        SEED["ConfigMap: flux-api-seed-data"]

        P1["Provider: platform-components"]
        P2["Provider: namespaces"]
        P3["Provider: rolebindings"]

        RS1["ResourceSet: platform-components"]
        RS2["ResourceSet: namespaces"]
        RS3["ResourceSet: rolebindings"]

        HR1["HelmRelease: platform-cert-manager"]
        HR2["HelmRelease: platform-traefik"]
        HR3["HelmRelease: platform-podinfo"]
    end

    subgraph "Created namespaces"
        NS1["cert-manager"]
        NS2["traefik"]
        NS3["podinfo"]
    end

    SEED -->|"loaded at startup"| API_R
    SEED -->|"loaded at startup"| API_C
    P1 -->|"polls"| API_R
    P2 -->|"polls"| API_R
    P3 -->|"polls"| API_R
    P1 --> RS1
    P2 --> RS2
    P3 --> RS3
    RS1 -->|"renders"| HR1 & HR2 & HR3
    RS2 -->|"renders"| NS1 & NS2 & NS3

Seed Data

The demo uses data/seed.json which contains:

One cluster: demo-cluster-01

Environment: dev
3 platform components: cert-manager, traefik, podinfo
3 namespaces: cert-manager, traefik, podinfo
2 rolebindings: platform-admins (cluster-admin), dev-readers (view)
Patches for podinfo (replica count, UI color, UI message) and traefik (replicas, service type)

Three catalog entries: cert-manager, traefik, podinfo — each pointing to public Helm chart repositories.

Checking Status

After make demo, verify everything is running:

# Check pods
kubectl get pods -n flux-system

# Check providers
kubectl get resourcesetinputproviders -n flux-system

# Check resourcesets
kubectl get resourcesets -n flux-system

# Check HelmReleases
kubectl get helmreleases -n flux-system

# Check created namespaces
kubectl get namespaces

# Check rolebindings
kubectl get clusterrolebindings platform-admins dev-readers

Running the CLI Demo

The automated CLI demo flow exercises the full lifecycle:

Step 1: Port-forward the API

make cli-demo-port-forward

This exposes the API on http://127.0.0.1:8080.

Step 2: Run the CLI demo

In another terminal:

make cli-demo

This:

Builds the CLI
Lists clusters and namespaces
Adds a new namespace (demo-runtime) via CLI
Forces reconciliation
Waits for the namespace to be created
Verifies the namespace exists

Step 3: Manual CLI exploration

export FLUX_API_URL=http://127.0.0.1:8080
export FLUX_API_TOKEN="$(kubectl -n flux-system get secret internal-api-token \
  -o jsonpath='{.data.token}' | base64 -d)"
export FLUX_API_WRITE_TOKEN="$FLUX_API_TOKEN"

# List clusters
./target/debug/flux-resourceset-cli cluster list | jq .

# List namespaces
./target/debug/flux-resourceset-cli namespace list | jq .

# Get Flux-formatted platform components
curl -s -H "Authorization: Bearer $FLUX_API_TOKEN" \
  http://127.0.0.1:8080/api/v2/flux/clusters/demo-cluster-01.k8s.example.com/platform-components | jq .

Podinfo Patch Demo

This demonstrates dynamic patching — changing Helm values via the API and watching Flux reconcile:

# 1. Check current state
kubectl get configmap -n flux-system values-podinfo-demo-cluster-01 \
  -o jsonpath='replicas={.data.replicaCount} color={.data.ui\.color} message={.data.ui\.message}{"\n"}'
kubectl get deploy -n podinfo podinfo \
  -o jsonpath='replicas={.spec.replicas} color={.spec.template.spec.containers[0].env[?(@.name=="PODINFO_UI_COLOR")].value} message={.spec.template.spec.containers[0].env[?(@.name=="PODINFO_UI_MESSAGE")].value}{"\n"}'

# 2. Patch via CLI
./target/debug/flux-resourceset-cli demo patch-component demo-cluster-01 podinfo \
  --set replicaCount=3 \
  --set ui.message="Hello from CLI patch" \
  --set ui.color="#3b82f6" | jq .

# 3. Force reconcile inputs/templates
kubectl annotate resourcesetinputprovider platform-components -n flux-system \
  fluxcd.controlplane.io/requestedAt="$(date -u +"%Y-%m-%dT%H:%M:%SZ")" --overwrite
kubectl annotate resourceset platform-components -n flux-system \
  fluxcd.controlplane.io/requestedAt="$(date -u +"%Y-%m-%dT%H:%M:%SZ")" --overwrite

# 4. Trigger immediate Helm reconcile
flux reconcile helmrelease platform-podinfo -n flux-system --with-source

# 5. Verify
kubectl get hr -n flux-system platform-podinfo \
  -o jsonpath='ready={.status.conditions[?(@.type=="Ready")].status} reason={.status.conditions[?(@.type=="Ready")].reason} action={.status.lastAttemptedReleaseAction}{"\n"}'
kubectl get configmap -n flux-system values-podinfo-demo-cluster-01 \
  -o jsonpath='replicas={.data.replicaCount} color={.data.ui\.color} message={.data.ui\.message}{"\n"}'
kubectl get deploy -n podinfo podinfo \
  -o jsonpath='replicas={.spec.replicas} color={.spec.template.spec.containers[0].env[?(@.name=="PODINFO_UI_COLOR")].value} message={.spec.template.spec.containers[0].env[?(@.name=="PODINFO_UI_MESSAGE")].value}{"\n"}'

# 6. Optional: check the UI
kubectl -n podinfo port-forward svc/podinfo 9898:9898
# Open http://127.0.0.1:9898

Cleanup

make kind-delete
# or
make clean

This deletes the kind cluster and all associated resources.

CLI Usage

flux-resourceset-cli is a command-line tool for interacting with the CRUD API. It is built from the same codebase and generated from the same Firestone schemas as the API.

Building

cd flux-resourceset
cargo build --bin flux-resourceset-cli

The binary is at target/debug/flux-resourceset-cli.

Environment Variables

Variable	Required	Description
`FLUX_API_URL`	yes	API base URL (e.g., `http://127.0.0.1:8080`)
`FLUX_API_TOKEN`	yes	Bearer token for read operations
`FLUX_API_WRITE_TOKEN`	yes	Bearer token for write operations

Setup from Demo Cluster

export FLUX_API_URL=http://127.0.0.1:8080
export FLUX_API_TOKEN="$(kubectl -n flux-system get secret internal-api-token \
  -o jsonpath='{.data.token}' | base64 -d)"
export FLUX_API_WRITE_TOKEN="$FLUX_API_TOKEN"

Commands

Cluster Operations

# List all clusters
flux-resourceset-cli cluster list

# Get a specific cluster
flux-resourceset-cli cluster get demo-cluster-01

Namespace Operations

# List all namespaces
flux-resourceset-cli namespace list

# Get a specific namespace
flux-resourceset-cli namespace get cert-manager

# Create namespace record and attach reference to a cluster
flux-resourceset-cli namespace create team-sandbox --cluster demo-cluster-01 \
  --label team=sandbox --annotation owner=platform

# Attach/detach an existing namespace record
flux-resourceset-cli namespace assign team-sandbox --cluster demo-cluster-01
flux-resourceset-cli namespace unassign team-sandbox --cluster demo-cluster-01

Platform Component Operations

# List all catalog components
flux-resourceset-cli component list

# Get a specific component
flux-resourceset-cli component get cert-manager

# Create/ensure catalog component, then attach to cluster
flux-resourceset-cli component create cert-manager \
  --component-path core/cert-manager/1.14.0 \
  --component-version 1.14.0 \
  --oci-url oci://registry.example/platform-components \
  --oci-tag v1.0.0 \
  --cluster demo-cluster-01

# Attach/detach existing component references
flux-resourceset-cli component assign cert-manager --cluster demo-cluster-01
flux-resourceset-cli component unassign cert-manager --cluster demo-cluster-01

# Patch per-cluster component values
flux-resourceset-cli component patch podinfo --cluster demo-cluster-01 --set replicaCount=3

Demo Commands

The CLI includes demo-specific commands for common workflows:

# Add a namespace to a cluster
flux-resourceset-cli demo add-namespace <cluster-id> <namespace> \
  --label team=platform \
  --annotation owner=you

# Patch one component using dynamic key/value paths
flux-resourceset-cli demo patch-component <cluster-id> <component-id> \
  --set replicaCount=3 \
  --set ui.message="Hello" \
  --set ui.color="#3b82f6"

# Get Flux-formatted namespace response
flux-resourceset-cli demo flux-namespaces <cluster-dns>

Output

All CLI commands output JSON. Pipe to jq for pretty formatting:

flux-resourceset-cli cluster list | jq .

Workflow Examples

Add a namespace and watch Flux create it

# 1. Create namespace + attach reference
flux-resourceset-cli namespace create team-sandbox --cluster demo-cluster-01 \
  --label team=sandbox --annotation owner=platform

# 2. Force reconcile
kubectl annotate resourcesetinputprovider namespaces -n flux-system \
  fluxcd.controlplane.io/requestedAt="$(date -u +"%Y-%m-%dT%H:%M:%SZ")" --overwrite
kubectl annotate resourceset namespaces -n flux-system \
  fluxcd.controlplane.io/requestedAt="$(date -u +"%Y-%m-%dT%H:%M:%SZ")" --overwrite

# 3. Wait and verify
kubectl get ns team-sandbox

Patch a component and verify

# 1. Patch
flux-resourceset-cli demo patch-component demo-cluster-01 podinfo --set replicaCount=5

# 2. Refresh provider + resourceset
kubectl annotate resourcesetinputprovider platform-components -n flux-system \
  fluxcd.controlplane.io/requestedAt="$(date -u +"%Y-%m-%dT%H:%M:%SZ")" --overwrite
kubectl annotate resourceset platform-components -n flux-system \
  fluxcd.controlplane.io/requestedAt="$(date -u +"%Y-%m-%dT%H:%M:%SZ")" --overwrite

# 3. Trigger immediate Helm upgrade
flux reconcile helmrelease platform-podinfo -n flux-system --with-source

# 4. Verify
kubectl get deploy -n podinfo podinfo \
  -o jsonpath='replicas={.spec.replicas} color={.spec.template.spec.containers[0].env[?(@.name=="PODINFO_UI_COLOR")].value} message={.spec.template.spec.containers[0].env[?(@.name=="PODINFO_UI_MESSAGE")].value}{"\n"}'

Extending with New Resource Types

The architecture is designed to be extended with new resource types beyond the initial three (platform-components, namespaces, rolebindings). Adding a new resource type follows a consistent pattern.

The Pattern

Every resource type requires four pieces:

flowchart TD
    A["1. Data Schema<br/>(Firestone resource definition)"] --> B["2. API Endpoint<br/>(returns {inputs: [...]})"]
    B --> C["3. ResourceSetInputProvider<br/>(calls the endpoint)"]
    C --> D["4. ResourceSet Template<br/>(renders Kubernetes resources)"]

Step-by-Step: Adding Network Policies

Let’s walk through adding a network-policies resource type.

Step 1: Define the Firestone Schema

Create resources/network_policy.yaml:

kind: network_policy
apiVersion: v1
schema:
  type: object
  required: [id, target_namespace, ingress_rules]
  properties:
    id:
      type: string
      example: allow-monitoring
    target_namespace:
      type: string
      example: monitoring
    ingress_rules:
      type: array
      items:
        type: object
        properties:
          from_namespace:
            type: string
          port:
            type: integer

Step 2: Add to the Cluster Schema

In resources/cluster.yaml, add a network_policies array:

network_policies:
  type: array
  items:
    $ref: "#/components/schemas/network_policy_ref"
  description: Network policies to sync to this cluster.

Step 3: Regenerate Code

make generate

This updates the OpenAPI spec, Rust models, and CLI modules.

Step 4: Implement the API Endpoint

Add GET /api/v2/flux/clusters/{cluster_dns}/network-policies that returns:

{
  "inputs": [
    {
      "id": "allow-monitoring",
      "target_namespace": "monitoring",
      "ingress_rules": [
        { "from_namespace": "prometheus", "port": 9090 }
      ],
      "cluster": {
        "name": "us-east-prod-01",
        "dns": "us-east-prod-01.k8s.example.com",
        "environment": "prod"
      }
    }
  ]
}

Step 5: Create the ResourceSetInputProvider

apiVersion: fluxcd.controlplane.io/v1
kind: ResourceSetInputProvider
metadata:
  name: network-policies
  namespace: flux-system
  annotations:
    fluxcd.controlplane.io/reconcileEvery: "5m"
spec:
  type: ExternalService
  url: "${INTERNAL_API_URL}/api/v2/flux/clusters/${CLUSTER_DNS}/network-policies"
  secretRef:
    name: internal-api-token

Step 6: Create the ResourceSet Template

apiVersion: fluxcd.controlplane.io/v1
kind: ResourceSet
metadata:
  name: network-policies
  namespace: flux-system
spec:
  inputsFrom:
    - name: network-policies
  resourcesTemplate: |
    ---
    apiVersion: networking.k8s.io/v1
    kind: NetworkPolicy
    metadata:
      name: << inputs.id >>
      namespace: << inputs.target_namespace >>
    spec:
      podSelector: {}
      policyTypes:
        - Ingress
      ingress:
        <<- range $rule := inputs.ingress_rules >>
        - from:
            - namespaceSelector:
                matchLabels:
                  kubernetes.io/metadata.name: << $rule.from_namespace >>
          ports:
            - port: << $rule.port >>
        <<- end >>

Step 7: Deploy

Add the provider and ResourceSet to the bootstrap manifests (for new clusters) and apply them to existing clusters.

What Makes This Extensible

Aspect	How It Helps
Consistent contract	Every resource type uses `{"inputs": [...]}` — same provider, same pattern
Independent providers	Each resource type polls independently — no coupling
Schema-driven	Firestone generates models, OpenAPI, and CLI for new types automatically
Template isolation	Each ResourceSet template handles one type — no monolithic templates

Ideas for Additional Resource Types

Resource Type	Kubernetes Resources	Use Case
Network Policies	NetworkPolicy	Per-cluster network segmentation
Resource Quotas	ResourceQuota, LimitRange	Namespace resource limits
Secrets	ExternalSecret (ESO)	Centralized secret management
Ingress Routes	Ingress, IngressRoute	Per-cluster routing rules
Custom CRDs	Any custom resource	Organization-specific resources

Each follows the same four-piece pattern: schema, endpoint, provider, template.

Frequently Asked Questions

Architecture & Design Decisions

Why an API instead of direct Kubernetes API access?

A common reaction is: “Why not just give operators kubectl access or build tooling that talks directly to the Kubernetes API on each cluster?”

The answer comes down to control, safety, and scale:

Concern	Direct Kubernetes API	Purpose-built API (this)
Blast radius	One bad `kubectl apply` can break a cluster. Operators need kubeconfig access to every cluster.	All changes flow through a single API with validation. No direct cluster access needed for platform operations.
Business logic	The Kubernetes API has no concept of “platform components,” “environment tiers,” or “component catalogs.” You build that logic into scripts.	The API encodes your organization’s domain model. Merge logic, catalog defaults, environment resolution, and patching rules are built in.
Audit trail	Kubernetes audit logs are per-cluster and verbose. Correlating “who changed what across 200 clusters” is painful.	One API, one audit log. Every mutation is traceable to a user, timestamp, and change payload.
Integration	Integrating CI/CD, chatops, ticketing, or approval workflows with raw Kubernetes APIs across many clusters requires custom glue per cluster.	One REST API to integrate with. Webhooks, CI pipelines, Slack bots, and approval systems all talk to one endpoint.
Credential management	Operators (or CI) need kubeconfigs for every cluster. Rotating credentials means touching every cluster.	Operators need one API token. Clusters hold one read token. Token rotation is centralized.
Consistency	Without enforcement, two operators can configure the same component differently on two clusters. Scripts drift.	The catalog + merge model guarantees consistent computed state. Per-cluster differences are explicit and auditable.
Rollback	Rolling back a `kubectl apply` requires knowing exactly what was applied and in what order.	Revert the API data. Next poll cycle, Flux reconciles back.

In short: The Kubernetes API is a powerful infrastructure primitive, but it is not a platform management API. This service adds the domain logic, guardrails, and integration surface that enterprise operations require.

Is this actually GitOps?

Yes — with a nuance. This is a GitOps-based model that adds an API-driven data layer.

The GitOps principles are preserved:

Declarative — desired state is declared in structured data (API) and templates (Git)
Versioned and immutable — templates are version-controlled in Git. API data changes are auditable and reversible.
Pulled automatically — clusters pull their state; no manual push required
Continuously reconciled — Flux detects and corrects drift automatically

What the API adds:

Dynamic data — instead of static YAML files per cluster, the API computes each cluster’s state from catalog + overrides
Operational velocity — data changes (scaling, patching, enabling/disabling) do not require Git PRs
Business logic — merge rules, catalog defaults, and environment resolution happen in the API, not in Git overlays

The templates that govern how resources are deployed still live in Git and go through standard review. The API controls what is deployed where — the operational data plane.

Why not ArgoCD ApplicationSets?

ArgoCD ApplicationSets solve a similar problem (managing resources across many clusters) but take a fundamentally different approach:

Aspect	ArgoCD ApplicationSets	This architecture
Model	Push from management cluster	Pull from each cluster
Management cluster dependency	Required — ArgoCD must maintain connections to all clusters	Not required for platform management — clusters are autonomous
Failure mode	Management cluster down = no reconciliation anywhere	API down = clusters keep running, just cannot get updates
Kubeconfig management	ArgoCD needs kubeconfigs for every target cluster	Each cluster holds one API bearer token
Network direction	Management cluster → target clusters (requires inbound access to clusters)	Target clusters → API (outbound only)
Data source	Git repos with generators (list, cluster, git, matrix)	API with merge logic and dynamic catalog
Per-cluster overrides	Generators + overlays (can get complex)	First-class `patches` object in the API

Both are valid approaches. ApplicationSets work well when you have a stable management cluster with reliable connectivity to all targets. The phone-home model works better when clusters are distributed, network connectivity is unreliable, or you need clusters to be autonomous.

Does this work on-premises?

Yes. The architecture is infrastructure-agnostic. It has no dependency on any specific cloud provider, VM provisioner, or Kubernetes distribution.

Environment	Requirements
On-prem bare metal	Kubernetes cluster with Flux Operator installed. Outbound HTTPS to the API.
On-prem VMs	Same — any hypervisor (VMware, KVM, Hyper-V).
Public cloud (EKS, AKS, GKE)	Deploy Flux Operator as a Helm chart or add-on.
Edge / remote sites	Lightweight K8s (k3s, k0s, MicroK8s). Can work over VPN or direct internet.
Air-gapped	Possible with a local API mirror and OCI registry mirror inside the air gap.
Hybrid	Mix any of the above. Every cluster phones home to the same API.

The provisioning tooling is completely decoupled. Whether you use Terraform, Cluster API, Crossplane, Rancher, manual scripts, or your own management cluster — once Flux is running and the cluster-identity ConfigMap exists, the phone-home loop works.

Why separate read-only and CRUD modes?

The two modes serve fundamentally different access patterns:

Mode	Consumers	Pattern	Scaling
`read-only`	Hundreds/thousands of clusters polling	High concurrency, small payloads, predictable load	Multi-replica, horizontal scaling
`crud`	Operators, CLI, CI/CD pipelines	Low concurrency, larger payloads, bursty	Single replica or small deployment

Separating them gives you:

Independent scaling — read replicas scale with fleet size; CRUD does not need to
Security boundary — read-only instances never accept writes; separate tokens for each
Blast radius — a CRUD deployment issue does not affect cluster polling
Simpler operations — read-only instances are stateless and disposable

Operational Questions

What happens if the API goes down?

Clusters keep running. They continue reconciling from their last-known state. Existing HelmReleases, Namespaces, and ClusterRoleBindings all remain in place and healthy.

What stops working:

New configuration changes are not picked up until the API recovers
The ResourceSetInputProvider status shows not-ready
Alerts should fire based on provider status conditions

This is a key advantage over push-based models — API downtime is an inconvenience, not an outage.

How do I roll back a bad change?

Revert the API data — update the cluster document or catalog entry back to the previous state
Wait for next poll — or force an immediate reconcile with kubectl annotate
Flux reconciles — the ResourceSet re-renders with the reverted data, and Flux applies the diff

For template changes (in Git), use standard Git revert workflows. Flux picks up the reverted template on next reconcile.

How do I handle secrets?

The patches object is for non-sensitive configuration only (replica counts, feature flags, resource limits). For secrets:

Use the External Secrets Operator to sync secrets from a vault (HashiCorp Vault, AWS Secrets Manager, Azure Key Vault, etc.)
Reference Kubernetes Secrets in HelmRelease valuesFrom instead of ConfigMaps
Add an external-secrets resource type to the API to manage ESO ExternalSecret resources via the same phone-home pattern

Can I use this with existing Flux installations?

Yes. The ResourceSetInputProvider and ResourceSet are standard Flux Operator CRDs. They coexist with existing GitRepositories, HelmRepositories, Kustomizations, and HelmReleases.

You can adopt incrementally:

Install the Flux Operator alongside existing Flux controllers
Deploy providers and ResourceSets for one resource type (e.g., namespaces)
Migrate additional resource types as confidence grows
Existing Git-based Flux resources continue working unchanged

How does this compare to Helm value files per cluster?

Aspect	Helm values per cluster	API-driven patching
Storage	YAML files in Git (one per cluster, or overlays)	Structured data in the API
Updating 100 clusters	100 file edits + PR	Batch API call
Per-cluster customization	Overlay hierarchy (can get deeply nested)	Flat `patches` object per cluster per component
Dynamic values	Requires scripted Git commits	API call → next poll → reconciled
Review requirement	Git PR for every change (even scaling)	API auth for data changes; Git PR for template changes
Merge conflicts	Possible with concurrent PRs	Not possible — API handles concurrency

Can I extend this beyond platform components?

Yes. The architecture is designed for it. Any Kubernetes resource type can be managed this way. See the Extending chapter for a step-by-step walkthrough.

Ideas that organizations have considered:

Network policies
Resource quotas and limit ranges
External secrets
Ingress routes and TLS certificates
Custom CRDs specific to the organization
Monitoring and alerting configurations (PrometheusRule, ServiceMonitor)

Each follows the same pattern: schema, endpoint, provider, template.

Performance & Scale

How many clusters can this support?

The API is stateless and the per-request cost is minimal (one data store read + one merge). Rough numbers:

Clusters	Resource Types	Poll Interval	Requests/sec
100	3	5 min	1
500	3	5 min	5
1,000	3	5 min	10
5,000	3	5 min	50
10,000	5	5 min	167

Even at 10,000 clusters with 5 resource types, the load is ~167 req/sec — well within the capacity of a small API deployment. Add read replicas for HA, not for throughput.

What is the latency from API change to cluster reconciliation?

It depends on the poll interval configured on the ResourceSetInputProvider. The default is 5 minutes. For faster feedback:

Set fluxcd.controlplane.io/reconcileEvery: "30s" on the provider (the demo uses this)
Force immediate reconciliation by annotating the provider with fluxcd.controlplane.io/requestedAt
In practice, 5-minute intervals are fine for production — platform component changes are not latency-sensitive

Does every cluster get the full catalog?

No. Each cluster only receives the components, namespaces, and rolebindings assigned to it in the cluster document. The API computes a cluster-specific response — a cluster with 5 components gets 5 inputs, not the entire catalog.

Keyboard shortcuts

Flux ResourceSet — External-Service GitOps for Multi-Cluster Kubernetes