Flux ResourceSet — API-Driven GitOps
flux-resourceset is a repo containing an example API service that powers an API-driven, GitOps-based model for managing Kubernetes clusters at enterprise scale. Instead of a central management cluster pushing configuration to child clusters, each child cluster pulls its own desired state from this API — and Flux reconciles the difference.
A GitOps-based model. The ResourceSet templates that define how resources are rendered live in Git and follow standard GitOps review workflows. The API adds a dynamic data layer on top — what each cluster should run is served by the API, while how it is deployed is governed by version-controlled templates. The combination preserves GitOps principles (declarative, versioned, continuously reconciled) while adding the operational flexibility that enterprise multi-cluster management demands.
The Problem
Traditional enterprise Kubernetes platforms suffer from:
- Slow provisioning — cluster creation taking weeks, not minutes
- State divergence — configuration management tools (Ansible, Terraform, Puppet, Salt, or custom automation scripts), CMDB databases, and actual cluster state drifting apart over time
- Manual release ceremonies — PRs, approvals, and tier-by-tier rollouts for every platform component change
- Scaling bottlenecks — centralized push-based management that breaks down at hundreds of clusters
- Infrastructure lock-in — tooling that assumes a specific cloud provider or VM provisioner, making hybrid and multi-cloud deployments painful
The Solution
This project implements a resource-driven, pull-based architecture where:
- A central API (this service) is the single source of truth for cluster configuration
- Each cluster’s Flux Operator phones home to fetch its desired state
- ResourceSet templates render Kubernetes resources from the API response
- Flux continuously reconciles — any API change is automatically applied
This model is infrastructure-agnostic. It works on bare-metal on-premises data centers, private cloud, public cloud (AWS EKS, Azure AKS, GCP GKE), edge locations, or any hybrid combination. The only requirement is that each cluster can make outbound HTTPS requests to the API endpoint.
graph TB
API["flux-resourceset API<br/>(single source of truth)"]
subgraph "Child Cluster 1"
P1["ResourceSetInputProvider<br/>(polls every 5m)"]
RS1["ResourceSet<br/>(renders templates)"]
K1["Flux Kustomize/Helm<br/>(reconciles)"]
P1 -->|"fetches inputs"| RS1
RS1 -->|"creates resources"| K1
end
subgraph "Child Cluster 2"
P2["ResourceSetInputProvider"]
RS2["ResourceSet"]
K2["Flux Kustomize/Helm"]
P2 --> RS2 --> K2
end
subgraph "Child Cluster N"
PN["ResourceSetInputProvider"]
RSN["ResourceSet"]
KN["Flux Kustomize/Helm"]
PN --> RSN --> KN
end
P1 -->|"GET /clusters/{dns}/platform-components"| API
P2 -->|"GET /clusters/{dns}/namespaces"| API
PN -->|"GET /clusters/{dns}/rolebindings"| API
Key Upstream Projects
This architecture builds on two open-source projects:
- Flux Operator — provides the ResourceSet and ResourceSetInputProvider CRDs that power the templating and phone-home polling. The
ExternalServiceinput type is the foundation this architecture is built on. (GitHub) - Firestone — a resource-based API specification generator that converts JSON Schema definitions into OpenAPI specs, CLI tools, and downstream code. Firestone defines the resource schemas (cluster, platform_component, namespace, rolebinding) that drive code generation for this project.
What This Service Does
flux-resourceset reads cluster configuration data, merges per-cluster overrides with catalog defaults, and returns responses in the {"inputs": [...]} format that the Flux Operator’s ResourceSetInputProvider (ExternalService type) requires.
Each resource type gets its own endpoint:
| Endpoint | What It Returns |
|---|---|
GET /api/v2/flux/clusters/{dns}/platform-components | HelmRelease + HelmRepository + ConfigMap inputs per component |
GET /api/v2/flux/clusters/{dns}/namespaces | Namespace inputs with labels and annotations |
GET /api/v2/flux/clusters/{dns}/rolebindings | ClusterRoleBinding inputs with subjects |
GET /api/v2/flux/clusters | Cluster list for management plane provisioning |
Key Concepts
| Concept | Description |
|---|---|
| Phone-home model | Clusters pull config; the API never pushes. Scales to thousands of clusters. |
| Resource-driven development | Define resources (clusters, components, namespaces) as structured data. Templates turn data into Kubernetes manifests. |
| Dynamic patching | Per-cluster, per-component value overrides without touching Git. Change a replica count in the API and watch Flux reconcile. |
| Catalog + overrides | Platform components live in a catalog with defaults. Each cluster can override oci_tag, component_path, or inject custom patches. |
| ExternalService contract | All responses follow {"inputs": [{"id": "...", ...}]} — the format Flux Operator requires. |
| Infrastructure-agnostic | Works on-prem, in the cloud, at the edge, or across hybrid environments. No vendor lock-in. |
Quick Start
cd flux-resourceset
make demo # Creates kind cluster, installs Flux, deploys API + demo data
make cli-demo # Runs the CLI demo flow end-to-end
See the Local Demo chapter for full details.
System Overview
The architecture separates concerns into three layers: the data plane (where cluster config lives), the API plane (this service), and the cluster plane (Flux running on each child cluster).
High-Level Architecture
graph TB
subgraph "Data Layer"
DB[("Data Store<br/>(SQLite / In-Memory)")]
end
subgraph "API Layer"
READ["flux-resourceset<br/>(read-only mode)"]
CRUD["flux-resourceset<br/>(CRUD mode)"]
CLI["flux-resourceset-cli"]
end
subgraph "Cluster Layer"
subgraph "Child Cluster"
RSIP["ResourceSetInputProvider<br/>type: ExternalService"]
RS["ResourceSet<br/>(templates)"]
HR["HelmRelease / Kustomization"]
NS["Namespace"]
RB["ClusterRoleBinding"]
end
end
DB -->|"read"| READ
DB <-->|"read/write"| CRUD
CLI -->|"CRUD operations"| CRUD
RSIP -->|"polls"| READ
RSIP -->|"inputs"| RS
RS -->|"renders"| HR
RS -->|"renders"| NS
RS -->|"renders"| RB
Component Roles
Data Store
By default, this is SQLite (configured via DATABASE_URL). For lightweight/dev workflows it can run in-memory (STORE_BACKEND=memory) using data/seed.json as initial state.
The store holds four logical resource sets:
- clusters — each cluster’s full configuration: assigned components, namespaces, rolebindings, and per-component patches
- platform_components — component catalog entries with defaults, OCI URLs/tags, and dependencies
- namespaces — reusable namespace definitions referenced by clusters
- rolebindings — reusable RBAC rolebinding definitions referenced by clusters
API Service (flux-resourceset)
A Rust service built with axum that operates in two modes:
| Mode | Purpose | Endpoints |
|---|---|---|
read-only | Flux polling — high concurrency, minimal resource usage | /api/v2/flux/..., /health, /ready |
crud | Operator/CLI access — full CRUD for managing cluster state | All read endpoints + /clusters, /platform_components, /namespaces, /rolebindings |
The read-only mode is designed to run as a multi-replica deployment serving cluster polls. The CRUD mode is for operators and CI/CD pipelines that need to modify cluster configuration.
CLI (flux-resourceset-cli)
A command-line tool for interacting with the CRUD API. Supports listing, creating, and patching resources. Used for demos and operational workflows.
Flux Operator (on each cluster)
Each cluster runs:
- ResourceSetInputProvider — calls the API on a schedule, fetches
{"inputs": [...]} - ResourceSet — takes the inputs and renders Kubernetes manifests from templates
- Flux controllers — reconcile the rendered manifests (HelmRelease, Kustomization, Namespace, etc.)
Data Flow
sequenceDiagram
participant Operator as Operator / CLI
participant API as flux-resourceset (CRUD)
participant DB as Data Store
participant ReadAPI as flux-resourceset (read-only)
participant Cluster as Child Cluster (Flux)
Operator->>API: PATCH /clusters/demo-cluster-01<br/>{"patches": {"podinfo": {"replicaCount": "3"}}}
API->>DB: Update cluster document
API-->>Operator: 200 OK
Note over Cluster: Every 5 minutes (or on-demand)
Cluster->>ReadAPI: GET /api/v2/flux/clusters/{dns}/platform-components
ReadAPI->>DB: Fetch cluster + catalog docs
DB-->>ReadAPI: Cluster doc + component catalog
ReadAPI->>ReadAPI: Merge overrides with catalog defaults
ReadAPI-->>Cluster: {"inputs": [{...component with patches...}]}
Cluster->>Cluster: ResourceSet renders HelmRelease with patched values
Cluster->>Cluster: Flux reconciles — podinfo scales to 3 replicas
Why This Architecture
vs. Push-Based (ArgoCD ApplicationSets, central Flux)
| Concern | Push-based | Phone-home (this) |
|---|---|---|
| Scalability | Management cluster must maintain connections to all children | Each cluster independently polls; API is stateless |
| Failure blast radius | Management cluster outage = all clusters lose reconciliation | API outage = clusters keep running last-known state |
| Network requirements | Management cluster needs outbound access to all clusters | Clusters need outbound access to one API endpoint |
| Credential management | Management cluster holds kubeconfigs for all clusters | Each cluster holds one bearer token |
vs. Git-per-Cluster
| Concern | Git-per-cluster | API-driven (this) |
|---|---|---|
| Updating 500 clusters | 500 PRs or complex monorepo tooling | One API call to update the component catalog |
| Per-cluster overrides | Branch strategies or overlay directories | First-class patches object per cluster |
| Audit trail | Git history | API audit log + Git history for templates |
| Dynamic response | Static YAML files | Merge logic computes cluster-specific state |
vs. Direct Kubernetes API Access
A common question is: why not have operators kubectl apply directly, or build tooling that talks to the Kubernetes API on each cluster? See the FAQ for a detailed answer. The short version: a purpose-built API gives you a single control point with business logic, validation, audit logging, and integration hooks — things the raw Kubernetes API does not provide at fleet scale.
Infrastructure Agnostic
This architecture has no dependency on a specific cloud provider, VM provisioner, or Kubernetes distribution. The phone-home pattern requires only one thing: outbound HTTPS from each cluster to the API.
graph TB
API["flux-resourceset API"]
subgraph "On-Premises Data Center"
OP1["Bare-metal cluster"]
OP2["VMware vSphere cluster"]
end
subgraph "Public Cloud"
AWS["AWS EKS"]
AZ["Azure AKS"]
GCP["GCP GKE"]
end
subgraph "Edge"
E1["Edge location 1"]
E2["Edge location 2"]
end
OP1 & OP2 -->|"HTTPS"| API
AWS & AZ & GCP -->|"HTTPS"| API
E1 & E2 -->|"HTTPS"| API
| Environment | How It Works |
|---|---|
| On-prem bare metal | Clusters provisioned via PXE boot, cloud-init, or immutable OS images. Flux bootstrap manifests pre-installed or applied post-boot. |
| On-prem VMs | VMware, KVM, Hyper-V, or any hypervisor. Same bootstrap pattern — inject identity, let Flux phone home. |
| Public cloud managed K8s | EKS, AKS, GKE — deploy Flux Operator as an add-on or Helm chart. Providers and ResourceSets applied via GitOps or cluster bootstrap. |
| Edge / remote sites | Lightweight clusters (k3s, k0s, MicroK8s) at edge locations. Phone home over VPN or direct HTTPS. |
| Hybrid | Mix any of the above. Each cluster phones home to the same API regardless of where it runs. |
The cluster provisioning mechanism is completely decoupled from the platform component management. Whether you use Terraform, Crossplane, Cluster API, custom scripts, or manual provisioning — once Flux is running and the cluster-identity ConfigMap exists, the phone-home loop takes over.
Phone-Home Model
The phone-home model is the core architectural pattern. Every child cluster is self-managing — it phones home to the API to discover its desired state, then reconciles locally. The provisioning layer’s only job is creating the cluster infrastructure and injecting a bootstrap identity. After that, the child cluster is autonomous.
How It Works
sequenceDiagram
participant Mgmt as Management Cluster
participant VM as Child Cluster VMs
participant Flux as Flux Operator
participant API as flux-resourceset API
Mgmt->>VM: Provision cluster infrastructure<br/>Inject cluster-identity ConfigMap
VM->>Flux: Cluster boots → Flux Operator starts
Flux->>Flux: Reads cluster-identity ConfigMap<br/>(CLUSTER_NAME, CLUSTER_DNS, ENVIRONMENT)
loop Every reconcile interval
Flux->>API: GET /clusters/{CLUSTER_DNS}/platform-components
API-->>Flux: {"inputs": [...components...]}
Flux->>Flux: ResourceSet renders HelmRelease per component
Flux->>Flux: Flux reconciles rendered resources
Flux->>API: GET /clusters/{CLUSTER_DNS}/namespaces
API-->>Flux: {"inputs": [...namespaces...]}
Flux->>Flux: ResourceSet renders Namespace resources
Flux->>API: GET /clusters/{CLUSTER_DNS}/rolebindings
API-->>Flux: {"inputs": [...bindings...]}
Flux->>Flux: ResourceSet renders ClusterRoleBinding resources
end
Note over Mgmt: Management cluster is out of the loop<br/>for all platform component management
Bootstrap Flow
The bootstrap sequence is designed so that every cluster starts identically and differentiates itself only through the API response:
-
Cluster provisioning — The infrastructure layer creates the cluster. This could be VMs from immutable OS images, cloud-managed Kubernetes (EKS, AKS, GKE), bare-metal nodes via PXE boot, or any other provisioning method. The Flux Operator bootstrap manifests are pre-installed in the image or applied post-boot.
-
Identity injection — A
cluster-identityConfigMap is the only cluster-specific data injected during provisioning:apiVersion: v1 kind: ConfigMap metadata: name: cluster-identity namespace: flux-system data: CLUSTER_NAME: "us-east-prod-01" CLUSTER_DNS: "us-east-prod-01.k8s.internal.example.com" ENVIRONMENT: "prod" INTERNAL_API_URL: "https://internal-api.internal.example.com" -
Flux bootstrap — The cluster boots. Pre-installed or applied manifests start the Flux Operator and deploy the ResourceSetInputProviders + ResourceSets.
-
Phone home — Each ResourceSetInputProvider calls the API using the cluster’s DNS name from the identity ConfigMap. The API returns that cluster’s specific configuration.
-
Self-reconciliation — Flux renders and reconciles. From this point forward, the cluster is self-managing.
What Happens When the API Is Unreachable
The phone-home model degrades gracefully:
| Scenario | Cluster Behavior |
|---|---|
| API down for minutes | ResourceSetInputProvider goes not-ready. Existing Flux resources continue reconciling from cached state. No disruption. |
| API down for hours | Same — clusters keep running. They just cannot pick up new configuration changes. |
| API returns changed data | On next successful poll, ResourceSet re-renders. Flux applies the diff. |
| API returns empty inputs | Flux garbage-collects all resources the ResourceSet previously created. This is the decommission path. |
Separation of Concerns
graph LR
subgraph "Provisioning Layer"
A["Cluster Provisioning<br/>(Terraform, Cluster API,<br/>cloud CLI, PXE, etc.)"]
B["DNS / Networking<br/>Setup"]
C["Identity Injection<br/>(cluster-identity ConfigMap)"]
end
subgraph "API Layer"
D["Single Source of Truth<br/>for all cluster configuration"]
end
subgraph "Child Cluster"
E["Platform component<br/>deployment & reconciliation"]
F["Namespace & RBAC<br/>management"]
end
A --> C
B --> C
D -.->|"polled by"| E
D -.->|"polled by"| F
The provisioning layer never deploys platform components to child clusters. It creates infrastructure and injects identity. The child cluster owns its own desired state by polling the API. This separation means the provisioning tooling (whether Terraform, Cluster API, Crossplane, custom scripts, or a management cluster) has no ongoing role in platform component management.
Per-Resource-Type Providers
Each resource type gets its own ResourceSetInputProvider + ResourceSet pair. This separation ensures:
- Independent reconciliation — a namespace change does not trigger platform component re-rendering
- Independent failure — if one provider fails, others continue working
- Clear templates — each ResourceSet template is focused on one resource type
| Resource Type | Provider Name | Endpoint |
|---|---|---|
| Platform components | platform-components | /api/v2/flux/clusters/{dns}/platform-components |
| Namespaces | namespaces | /api/v2/flux/clusters/{dns}/namespaces |
| Role bindings | rolebindings | /api/v2/flux/clusters/{dns}/rolebindings |
All providers are pre-installed in every cluster’s bootstrap manifests. The cluster does not need to know what resource types exist — it polls all of them from boot.
Resource-Driven Development
Resource-driven development is the design philosophy behind this architecture. Instead of writing imperative scripts or maintaining per-cluster YAML, you define resources as structured data and let templates + reconciliation handle the rest.
The Idea
Every entity in the platform is a resource with a schema:
erDiagram
CLUSTER ||--o{ COMPONENT_REF : "has platform_components"
CLUSTER ||--o{ NAMESPACE_REF : "has namespaces"
CLUSTER ||--o{ ROLEBINDING_REF : "has rolebindings"
CLUSTER ||--o{ PATCH : "has patches"
COMPONENT_REF }o--|| CATALOG_ENTRY : "references"
NAMESPACE_REF }o--|| NAMESPACE_DEF : "references"
ROLEBINDING_REF }o--|| ROLEBINDING_DEF : "references"
CLUSTER {
string id PK
string cluster_name
string cluster_dns
string environment
}
COMPONENT_REF {
string id FK
boolean enabled
string oci_tag "nullable override"
string component_path "nullable override"
}
CATALOG_ENTRY {
string id PK
string component_path
string component_version
string oci_url
string oci_tag
boolean cluster_env_enabled
string[] depends_on
}
NAMESPACE_REF {
string id
}
ROLEBINDING_REF {
string id
}
NAMESPACE_DEF {
string id PK
object labels
object annotations
}
ROLEBINDING_DEF {
string id PK
string role
object[] subjects
}
PATCH {
string component_id FK
object key_values
}
cluster.namespaces and cluster.rolebindings are reference arrays (id only). Full namespace/rolebinding payloads live in their own definition resources and are resolved during merge.
Resources are declared, not scripted. The API merges them. Templates render them. Flux reconciles them.
Three-Layer Separation
The architecture cleanly separates what from how from where:
| Layer | Responsibility | Who Owns It | Example |
|---|---|---|---|
| Data | What should exist on each cluster | Platform operators via API/CLI | “Cluster X should have cert-manager v1.14.0 with 3 replicas” |
| Templates | How resources are rendered into Kubernetes manifests | Platform engineers via Git | ResourceSet template that turns an input into a HelmRelease |
| Reconciliation | Where and when resources are applied | Flux Operator (automated) | Flux detects drift and applies the diff |
This separation means:
- Operators change cluster state by updating data (API calls), not by writing YAML
- Engineers change how things are deployed by updating templates (Git PRs), not by touching every cluster
- Flux handles the convergence loop — no manual
kubectl apply, no configuration management playbooks, no custom deployment scripts
How a Change Flows Through the System
Example: Adding a new platform component to 50 clusters
Traditional approach:
- Write Helm values for 50 clusters (or complex overlay structure)
- Open PR to add component to each cluster’s directory
- Wait for PR review and merge
- Watch tier-by-tier rollout
- Debug failures per-cluster
Resource-driven approach:
- Add the component to the catalog (one API call)
- Add a component reference to each cluster’s
platform_componentsarray (one API call per cluster, or a batch script) - Done — Flux picks it up on next poll
flowchart LR
A["API call:<br/>Add component<br/>to catalog"] --> B["API call:<br/>Add component_ref<br/>to cluster doc"]
B --> C["Next poll cycle:<br/>Provider fetches<br/>updated inputs"]
C --> D["ResourceSet renders<br/>HelmRepo + HelmRelease"]
D --> E["Flux reconciles:<br/>component installed"]
Example: Patching a component value on one cluster
flowchart LR
A["CLI:<br/>patch-component podinfo<br/>--set replicaCount=3"] --> B["API updates<br/>cluster.patches.podinfo"]
B --> C["Provider fetches<br/>updated inputs"]
C --> D["ResourceSet renders<br/>HelmRelease with<br/>valuesFrom ConfigMap"]
D --> E["Flux reconciles:<br/>podinfo scales to 3"]
No Git PR. No pipeline. The data change flows through the system automatically.
Resource Schemas as API Contracts
Each resource type has a defined schema managed via Firestone — a resource-based API specification generator that converts JSON Schema definitions into OpenAPI specs, CLI tools, and downstream code generation artifacts.
The schemas:
- cluster (v2) — the full cluster document with arrays of component refs, namespace refs, rolebinding refs, and a patches object
- platform_component (v1) — the catalog entry with OCI URLs, versions, dependencies
- namespace (v1) — namespace with labels and annotations
- rolebinding (v1) — role binding with subjects
These schemas are the single source of truth for:
- OpenAPI spec generation (
openapi/openapi.yaml) — used for API documentation and client generation - Rust model generation (
src/models/,src/apis/) — the structs the API service uses - CLI code generation (
src/generated/cli/) — the CLI commands for each resource type
When a schema changes, make generate regenerates all downstream artifacts. This ensures the API, CLI, and documentation stay in sync with the resource definitions. See the Firestone documentation for the full schema language and generator options.
Benefits for Enterprise
Auditability
Every state change goes through the API. The API can log who changed what, when. Combined with Git history for templates, you have a full audit trail.
Consistency
The merge logic guarantees that every cluster gets a consistent, computed response. No hand-edited YAML files that drift.
Velocity
Operators can change cluster state in seconds. No PR cycles for operational changes. Reserve Git PRs for template/structural changes.
Testability
Because resources are structured data, you can:
- Validate schemas before applying
- Unit test merge logic
- Integration test API responses against the ExternalService contract
- Dry-run template rendering
Separation of Permissions
- Template changes (how things deploy) require Git PR review
- Data changes (what is deployed where) require API auth tokens
- Reconciliation is automated — no human in the loop
API Reference
All endpoints return JSON. Flux-facing endpoints return the {"inputs": [...]} structure required by the ResourceSetInputProvider ExternalService contract. CRUD endpoints follow standard REST conventions.
Authentication
All endpoints require a Bearer token in the Authorization header.
| Mode | Read Token | Write Token |
|---|---|---|
read-only | AUTH_TOKEN env var | N/A (no write endpoints) |
crud | AUTH_TOKEN env var | CRUD_AUTH_TOKEN env var (falls back to AUTH_TOKEN) |
curl -H "Authorization: Bearer $AUTH_TOKEN" http://localhost:8080/health
Flux Read Endpoints
These endpoints are consumed by Flux Operator’s ResourceSetInputProvider. They follow the ExternalService contract.
ExternalService Contract
Every response must satisfy:
- Top-level
inputsarray - Each item has a unique string
id - Response body under 900 KiB
- All JSON value types (strings, numbers, booleans, arrays, objects) are preserved in templates
GET /api/v2/flux/clusters/{cluster_dns}/platform-components
Returns platform components assigned to a cluster, with catalog defaults merged and per-cluster overrides applied.
Path parameters:
| Parameter | Type | Description |
|---|---|---|
cluster_dns | string | The cluster’s DNS name (e.g., demo-cluster-01.k8s.example.com) |
Response:
{
"inputs": [
{
"id": "cert-manager",
"component_path": "cert-manager",
"component_version": "latest",
"cluster_env_enabled": false,
"depends_on": [],
"enabled": true,
"patches": {},
"cluster": {
"name": "demo-cluster-01",
"dns": "demo-cluster-01.k8s.example.com",
"environment": "dev"
},
"source": {
"oci_url": "https://charts.jetstack.io",
"oci_tag": "latest"
}
}
]
}
Field reference:
| Field | Type | Description |
|---|---|---|
id | string | Unique component identifier, used as Flux resource name suffix |
component_path | string | Chart name or path within OCI artifact. Cluster override takes precedence over catalog default |
component_version | string | Upstream version. "latest" means no version pinning |
cluster_env_enabled | boolean | If true, ResourceSet template appends /{environment} to the path |
depends_on | string[] | Component IDs that must be healthy first. Empty = no dependencies |
enabled | boolean | false causes Flux to garbage-collect the component |
patches | object | Per-cluster key-value overrides, injected via HelmRelease valuesFrom |
cluster.name | string | Cluster identifier |
cluster.dns | string | Cluster FQDN |
cluster.environment | string | Tier: dev, qa, uat, prod |
source.oci_url | string | Helm repository or OCI registry URL |
source.oci_tag | string | Chart/artifact version tag. Cluster override takes precedence |
GET /api/v2/flux/clusters/{cluster_dns}/namespaces
Returns namespaces assigned to a cluster.
Response:
{
"inputs": [
{
"id": "cert-manager",
"labels": { "app": "cert-manager" },
"annotations": {},
"cluster": {
"name": "demo-cluster-01",
"dns": "demo-cluster-01.k8s.example.com",
"environment": "dev"
}
}
]
}
GET /api/v2/flux/clusters/{cluster_dns}/rolebindings
Returns role bindings assigned to a cluster.
Response:
{
"inputs": [
{
"id": "platform-admins",
"role": "cluster-admin",
"subjects": [
{
"kind": "Group",
"name": "platform-team",
"apiGroup": "rbac.authorization.k8s.io"
}
],
"cluster": {
"name": "demo-cluster-01",
"dns": "demo-cluster-01.k8s.example.com",
"environment": "dev"
}
}
]
}
GET /api/v2/flux/clusters
Returns all clusters. Used by management cluster provisioners.
Response:
{
"inputs": [
{
"id": "demo-cluster-01",
"cluster_name": "demo-cluster-01",
"cluster_dns": "demo-cluster-01.k8s.example.com",
"environment": "dev"
}
]
}
CRUD Endpoints
Available when API_MODE=crud. These follow standard REST patterns.
Clusters
| Method | Path | Description |
|---|---|---|
GET | /clusters | List all clusters |
POST | /clusters | Create a cluster |
GET | /clusters/{id} | Get cluster by ID |
PUT | /clusters/{id} | Update a cluster |
DELETE | /clusters/{id} | Delete a cluster |
Cluster payload notes:
platform_components[]entries are references with per-cluster override fields (id,enabled, optionaloci_tag, optionalcomponent_path).namespaces[]entries are reference objects (idonly).rolebindings[]entries are reference objects (idonly).
Platform Components
| Method | Path | Description |
|---|---|---|
GET | /platform_components | List all catalog components |
POST | /platform_components | Create a catalog entry |
GET | /platform_components/{id} | Get component by ID |
PUT | /platform_components/{id} | Update a catalog entry |
DELETE | /platform_components/{id} | Delete a catalog entry |
Namespaces
| Method | Path | Description |
|---|---|---|
GET | /namespaces | List all namespace definitions |
POST | /namespaces | Create a namespace definition |
GET | /namespaces/{id} | Get namespace by ID |
PUT | /namespaces/{id} | Update a namespace definition |
DELETE | /namespaces/{id} | Delete a namespace definition |
Rolebindings
| Method | Path | Description |
|---|---|---|
GET | /rolebindings | List all rolebinding definitions |
POST | /rolebindings | Create a rolebinding definition |
GET | /rolebindings/{id} | Get rolebinding by ID |
PUT | /rolebindings/{id} | Update a rolebinding definition |
DELETE | /rolebindings/{id} | Delete a rolebinding definition |
Service Endpoints
| Method | Path | Description |
|---|---|---|
GET | /health | Liveness probe — returns {"status": "ok"} |
GET | /ready | Readiness probe endpoint — currently returns {"status": "ok"} |
GET | /openapi.yaml | OpenAPI 3.0 specification document |
Error Responses
| Status | Condition |
|---|---|
401 Unauthorized | Missing or invalid bearer token |
404 Not Found | Cluster DNS or resource ID not found |
500 Internal Server Error | Data store connection error |
Merge Logic
The merge logic is the critical path in the API. It takes raw cluster documents and catalog entries and produces the computed response that Flux consumes. Understanding the merge is key to understanding the entire system.
Platform Components Merge
This is the most complex merge. It combines three data sources into a single response:
flowchart TD
A["Cluster Document"] --> D["Merge Logic"]
B["Component Catalog"] --> D
C["Cluster Patches"] --> D
D --> E["Flux Response<br/>{inputs: [...]}"]
A -.- A1["platform_components[]<br/>per-cluster overrides"]
B -.- B1["Default oci_tag, component_path,<br/>oci_url, depends_on"]
C -.- C1["patches[component_id]<br/>key-value overrides"]
Merge Rules
For each component in the cluster’s platform_components array:
| Field | Source | Rule |
|---|---|---|
id | Cluster component ref | Passed through |
enabled | Cluster component ref | Passed through |
component_path | Cluster override OR catalog default | Cluster override wins if non-null |
component_version | Catalog | Always from catalog |
cluster_env_enabled | Catalog | Always from catalog (template handles path appending) |
source.oci_url | Catalog | Always from catalog |
source.oci_tag | Cluster override OR catalog default | Cluster override wins if non-null |
depends_on | Catalog | Always from catalog |
patches | Cluster patches[component_id] | Empty {} if no patches for this component |
cluster.name | Cluster doc | From cluster’s cluster_name |
cluster.dns | Cluster doc | From cluster’s cluster_dns |
cluster.environment | Cluster doc | From cluster’s environment |
Merge Example
Given this cluster document:
{
"cluster_name": "us-east-prod-01",
"cluster_dns": "us-east-prod-01.k8s.example.com",
"environment": "prod",
"platform_components": [
{ "id": "cert-manager", "enabled": true, "oci_tag": null, "component_path": null },
{ "id": "grafana", "enabled": true, "oci_tag": "v1.0.0-1", "component_path": "observability/grafana/17.1.0" }
],
"patches": {
"grafana": { "GRAFANA_REPLICAS": "3" }
}
}
And this catalog:
[
{ "_id": "cert-manager", "component_path": "core/cert-manager/1.14.0", "oci_url": "oci://registry/repo", "oci_tag": "v1.0.0", "depends_on": [] },
{ "_id": "grafana", "component_path": "observability/grafana/17.0.0", "oci_url": "oci://registry/repo", "oci_tag": "v1.0.0", "depends_on": ["cert-manager"] }
]
The merge produces:
{
"inputs": [
{
"id": "cert-manager",
"component_path": "core/cert-manager/1.14.0",
"source": { "oci_url": "oci://registry/repo", "oci_tag": "v1.0.0" },
"patches": {},
"cluster": { "name": "us-east-prod-01", "dns": "us-east-prod-01.k8s.example.com", "environment": "prod" }
},
{
"id": "grafana",
"component_path": "observability/grafana/17.1.0",
"source": { "oci_url": "oci://registry/repo", "oci_tag": "v1.0.0-1" },
"depends_on": ["cert-manager"],
"patches": { "GRAFANA_REPLICAS": "3" },
"cluster": { "name": "us-east-prod-01", "dns": "us-east-prod-01.k8s.example.com", "environment": "prod" }
}
]
}
Notice:
- cert-manager uses catalog defaults for everything (cluster overrides are null)
- grafana uses cluster override for
oci_tag(v1.0.0-1) andcomponent_path(observability/grafana/17.1.0) - grafana gets the per-cluster patch (
GRAFANA_REPLICAS: "3")
Namespaces Merge
Namespaces now use a reference + lookup model:
flowchart TD
A["Cluster Document"] --> C["Merge Logic"]
B["Namespace Definitions"] --> C
C --> D["Flux Response"]
A -.- A1["namespaces[]<br/>id references"]
B -.- B1["id, labels, annotations"]
D -.- D1["Each namespace gets<br/>cluster block nested in"]
Merge steps:
- Read
cluster.namespaces[]as ID references. - Resolve each ID from the namespace definitions store.
- Return resolved namespace payload + nested
clusterblock (name,dns,environment). - Any missing referenced IDs are skipped in Flux response generation.
Rolebindings Merge
Rolebindings follow the same pattern as namespaces:
- Read
cluster.rolebindings[]as ID references. - Resolve each ID from the rolebinding definitions store.
- Return resolved rolebinding payload (
id,role,subjects[]) + nestedclusterblock. - Any missing referenced IDs are skipped in Flux response generation.
Why Merge Matters
The merge logic is what makes this system more than a simple proxy. It enables:
- Catalog defaults — define a component once, inherit everywhere
- Per-cluster overrides — pin a specific cluster to a hotfix version without affecting others
- Per-cluster patches — inject environment-specific values without touching the component definition
- Computed responses — the cluster gets exactly the state it needs, computed from multiple data sources
Without the merge, you would need to duplicate the full component definition per cluster — which is exactly the problem this architecture solves.
Configuration & Deployment
Environment Variables
| Variable | Required | Default | Description |
|---|---|---|---|
API_MODE | no | read-only | Runtime mode: read-only or crud |
STORE_BACKEND | no | sqlite | Data backend: sqlite or memory |
DATABASE_URL | no | sqlite://data/flux-resourceset.db?mode=rwc | SQLite DSN when STORE_BACKEND=sqlite |
AUTH_TOKEN | yes | — | Bearer token for read routes |
CRUD_AUTH_TOKEN | no | AUTH_TOKEN | Bearer token for write routes in CRUD mode |
SEED_FILE | no | data/seed.json | Seed data file loaded at startup |
OPENAPI_FILE | no | openapi/openapi.yaml | OpenAPI document served at /openapi.yaml |
LISTEN_ADDR | no | 0.0.0.0:8080 | Bind address |
RUST_LOG | no | unset | Tracing filter directive |
Runtime Modes
read-only
The default mode. Serves only Flux read endpoints (/api/v2/flux/...) and service endpoints (/health, /ready, /openapi.yaml). Designed for high-concurrency polling from many clusters.
API_MODE=read-only AUTH_TOKEN=my-token cargo run
crud
Full CRUD mode. Includes all read endpoints plus REST endpoints for clusters, platform_components, namespaces, and rolebindings. Used by operators and CI/CD pipelines.
API_MODE=crud AUTH_TOKEN=read-token CRUD_AUTH_TOKEN=write-token cargo run
Production Deployment
Kubernetes Deployment (read-only)
apiVersion: apps/v1
kind: Deployment
metadata:
name: flux-api-read
namespace: flux-system
spec:
replicas: 2
selector:
matchLabels:
app: flux-api-read
template:
metadata:
labels:
app: flux-api-read
spec:
containers:
- name: flux-api
image: flux-resourceset:latest
ports:
- containerPort: 8080
env:
- name: API_MODE
value: "read-only"
- name: STORE_BACKEND
value: "sqlite"
- name: DATABASE_URL
value: "sqlite:///var/lib/flux-resourceset/flux-resourceset.db?mode=rwc"
- name: SEED_FILE
value: "/seed/seed.json"
- name: AUTH_TOKEN
valueFrom:
secretKeyRef:
name: internal-api-token
key: token
- name: RUST_LOG
value: "info"
resources:
requests:
cpu: 50m
memory: 32Mi
limits:
cpu: 200m
memory: 64Mi
livenessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 2
periodSeconds: 10
readinessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 2
periodSeconds: 5
Resource requests are deliberately small — Rust’s efficiency means this service uses minimal resources. Run 2+ replicas for high availability, not for throughput.
Performance Characteristics
Each request does a data store lookup and a merge. Expected latency is sub-millisecond for the in-memory backend and typically single-digit milliseconds for SQLite on local SSD.
| Clusters | Poll Interval | Requests/sec |
|---|---|---|
| 50 | 5 min | 0.17 |
| 200 | 5 min | 0.67 |
| 1,000 | 5 min | 3.3 |
| 5,000 | 5 min | 16.7 |
Even at 5,000 clusters with three resource types each, the load is ~50 req/sec — trivial for a Rust/axum service.
Build Commands
cargo build # Build API + CLI
cargo build --bin flux-resourceset-cli # Build CLI only
cargo test # Run all tests
cargo clippy -- -D warnings # Lint
cargo fmt # Format
Docker
make docker-build # Build container image
Code Generation
The project uses Firestone for schema-driven code generation:
make generate
This regenerates:
openapi/openapi.yaml— OpenAPI 3.0 specsrc/models/— Rust model structssrc/apis/— Rust API client modulessrc/generated/cli/— CLI command modules
ResourceSet Templates
ResourceSet templates are the bridge between API data and Kubernetes resources. They use the Flux Operator’s templating engine to render manifests from the {"inputs": [...]} response.
Upstream reference: See the full ResourceSet CRD documentation for all available spec fields, status conditions, and advanced features like inventory tracking and garbage collection.
Template Syntax
ResourceSet uses << and >> as delimiters (not {{/}}). This avoids conflicts with Helm templates and Go templates in the rendered YAML.
Key template functions:
<< inputs.field >>— access input fields<< inputs.nested.field >>— access nested objects<< inputs.field | slugify >>— slugify a string for use in Kubernetes names<<- range $k, $v := inputs.object >>— iterate over object keys<<- range $item := inputs.array >>— iterate over arrays<<- if inputs.field >>— conditional rendering<<- if ne inputs.field "value" >>— conditional with comparison<< inputs.object | toYaml | nindent N >>— convert to YAML with indentation
Platform Components Template
This is the most complex template. For each component input, it renders up to three resources:
flowchart TD
I["Input from API<br/>(one per component)"] --> CM{"Has patches?"}
CM -->|yes| ConfigMap["ConfigMap<br/>values-{id}-{cluster}"]
CM -->|no| Skip["Skip ConfigMap"]
I --> HR["HelmRepository<br/>charts-{id}"]
I --> HRL["HelmRelease<br/>platform-{id}"]
ConfigMap -.->|"valuesFrom"| HRL
HR -.->|"sourceRef"| HRL
Full Template
apiVersion: fluxcd.controlplane.io/v1
kind: ResourceSet
metadata:
name: platform-components
namespace: flux-system
spec:
inputsFrom:
- name: platform-components
resourcesTemplate: |
<<- if inputs.enabled >>
<<- if inputs.patches >>
---
apiVersion: v1
kind: ConfigMap
metadata:
name: values-<< inputs.id | slugify >>-<< inputs.cluster.name | slugify >>
namespace: flux-system
data:
<<- range $key, $value := inputs.patches >>
<< $key >>: "<< $value >>"
<<- end >>
<<- end >>
---
apiVersion: source.toolkit.fluxcd.io/v1
kind: HelmRepository
metadata:
name: charts-<< inputs.id | slugify >>
namespace: flux-system
spec:
interval: 30m
url: "<< inputs.source.oci_url >>"
---
apiVersion: helm.toolkit.fluxcd.io/v2
kind: HelmRelease
metadata:
name: platform-<< inputs.id >>
namespace: flux-system
spec:
interval: 10m
releaseName: << inputs.id | slugify >>
targetNamespace: << inputs.id | slugify >>
install:
remediation:
retries: 3
upgrade:
remediation:
retries: 3
chart:
spec:
chart: << inputs.component_path >>
sourceRef:
kind: HelmRepository
name: charts-<< inputs.id | slugify >>
namespace: flux-system
interval: 10m
<<- if ne inputs.component_version "latest" >>
version: "<< inputs.component_version >>"
<<- end >>
<<- if inputs.depends_on >>
dependsOn:
<<- range $dep := inputs.depends_on >>
- name: platform-<< $dep >>
<<- end >>
<<- end >>
<<- if inputs.patches >>
valuesFrom:
<<- range $key, $_ := inputs.patches >>
- kind: ConfigMap
name: values-<< inputs.id | slugify >>-<< inputs.cluster.name | slugify >>
valuesKey: << $key >>
targetPath: << $key >>
<<- end >>
<<- end >>
<<- end >>
What Each Section Does
Enabled check (<<- if inputs.enabled >>) — If the component is disabled, nothing is rendered. Flux garbage-collects previously rendered resources.
ConfigMap for patches — If the component has patches, a ConfigMap is created with the key-value pairs. The HelmRelease references this ConfigMap via valuesFrom, which maps each key to a Helm value path using targetPath.
HelmRepository — Points to the chart repository URL from inputs.source.oci_url.
HelmRelease — The core resource. Key behaviors:
chartreferences the HelmRepository and usesinputs.component_pathas the chart nameversionis only set ifcomponent_versionis not"latest"dependsOncreates ordering dependencies between componentsvaluesFrominjects per-cluster patches from the ConfigMap
Namespaces Template
Renders a Kubernetes Namespace for each input:
apiVersion: fluxcd.controlplane.io/v1
kind: ResourceSet
metadata:
name: namespaces
namespace: flux-system
spec:
inputsFrom:
- name: namespaces
resourcesTemplate: |
---
apiVersion: v1
kind: Namespace
metadata:
name: << inputs.id >>
labels:
<<- range $k, $v := inputs.labels >>
<< $k >>: "<< $v >>"
<<- end >>
annotations:
<<- range $k, $v := inputs.annotations >>
<< $k >>: "<< $v >>"
<<- end >>
Labels and annotations from the API response are dynamically rendered using range.
Rolebindings Template
Renders a ClusterRoleBinding for each input:
apiVersion: fluxcd.controlplane.io/v1
kind: ResourceSet
metadata:
name: rolebindings
namespace: flux-system
spec:
inputsFrom:
- name: rolebindings
resourcesTemplate: |
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: << inputs.id >>
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: << inputs.role >>
subjects:
<<- range $s := inputs.subjects >>
- kind: << $s.kind >>
name: << $s.name >>
apiGroup: << $s.apiGroup >>
<<- end >>
Template Design Principles
- One ResourceSet per resource type — keeps templates focused and failures isolated
- Conditional rendering — use
ifblocks to skip disabled components or optional fields - Slugify names — Kubernetes resource names must be DNS-compatible;
slugifyhandles this - Garbage collection — when an input disappears from the API response, Flux removes the resources that ResourceSet previously created
- No cluster-specific logic in templates — all cluster differentiation comes from the API data, not from template conditionals
Further Reading
- ResourceSet CRD reference — full spec, status fields, inventory tracking, and health checks
- ResourceSetInputProvider CRD reference — input types, polling configuration, authentication options
- Flux Operator GitHub — source code and issue tracker
ResourceSetInputProvider
The ResourceSetInputProvider is the Flux Operator CRD that tells a ResourceSet where to fetch its input data. In this architecture, every provider uses type: ExternalService to call the flux-resourceset API.
Upstream reference: See the full ResourceSetInputProvider CRD documentation for all supported input types, authentication options, and status conditions.
How Providers Work
flowchart LR
subgraph "flux-system namespace"
P["ResourceSetInputProvider<br/>type: ExternalService"]
S["Secret<br/>internal-api-token"]
RS["ResourceSet"]
end
API["flux-resourceset API"]
P -->|"GET (with bearer token)"| API
S -.->|"secretRef"| P
P -->|"provides inputs"| RS
RS -->|"renders resources"| K8s["Kubernetes Resources"]
Provider Configuration
Each provider specifies:
- type —
ExternalService(calls an HTTP API) - url — the endpoint to call
- secretRef — Kubernetes Secret containing the bearer token
- reconcileEvery — how often to poll (annotation)
Platform Components Provider
apiVersion: fluxcd.controlplane.io/v1
kind: ResourceSetInputProvider
metadata:
name: platform-components
namespace: flux-system
annotations:
fluxcd.controlplane.io/reconcileEvery: "30s"
spec:
type: ExternalService
url: http://flux-api-read.flux-system.svc.cluster.local:8080/api/v2/flux/clusters/demo-cluster-01.k8s.example.com/platform-components
insecure: true
secretRef:
name: internal-api-token
Namespaces Provider
apiVersion: fluxcd.controlplane.io/v1
kind: ResourceSetInputProvider
metadata:
name: namespaces
namespace: flux-system
annotations:
fluxcd.controlplane.io/reconcileEvery: "30s"
spec:
type: ExternalService
url: http://flux-api-read.flux-system.svc.cluster.local:8080/api/v2/flux/clusters/demo-cluster-01.k8s.example.com/namespaces
insecure: true
secretRef:
name: internal-api-token
Rolebindings Provider
Same pattern with /rolebindings endpoint.
URL Construction
In production, the provider URL uses variable substitution from the cluster-identity ConfigMap:
url: "${INTERNAL_API_URL}/api/v2/flux/clusters/${CLUSTER_DNS}/platform-components"
This means the same provider manifest works on every cluster — only the ConfigMap values differ.
In the demo, the URL is hardcoded to the in-cluster service address and a demo cluster DNS.
Authentication
The provider references a Secret that contains the bearer token:
apiVersion: v1
kind: Secret
metadata:
name: internal-api-token
namespace: flux-system
type: Opaque
stringData:
token: "your-bearer-token-here"
The Flux Operator sends this as Authorization: Bearer <token> on every request.
For production, consider:
- Token rotation — update the Secret, Flux picks up the new token on next request
- mTLS — ResourceSetInputProvider supports
certSecretReffor TLS client certificates
Reconciliation Behavior
| Event | Provider Behavior |
|---|---|
| Scheduled interval | Provider calls the API, ResourceSet re-renders if inputs changed |
| API returns same data | No change — ResourceSet does not re-render |
| API returns new data | ResourceSet re-renders, Flux applies the diff |
| API returns error | Provider goes not-ready, existing resources continue running |
| API unreachable | Same as error — graceful degradation |
| Manual trigger | Annotate with fluxcd.controlplane.io/requestedAt to force immediate reconcile |
Forcing Immediate Reconciliation
kubectl annotate resourcesetinputprovider platform-components -n flux-system \
fluxcd.controlplane.io/requestedAt="$(date -u +"%Y-%m-%dT%H:%M:%SZ")" --overwrite
Observing Provider Status
# Check provider status
kubectl get resourcesetinputproviders -n flux-system
# Detailed status with conditions
kubectl describe resourcesetinputprovider platform-components -n flux-system
# Check ResourceSet status
kubectl get resourcesets -n flux-system
Further Reading
- ResourceSetInputProvider CRD reference — full spec, input types, auth options, status conditions
- ResourceSet CRD reference — the templating CRD that consumes provider inputs
- Flux Operator — project home with installation guides and examples
Dynamic Patching
Dynamic patching is one of the most powerful features of this architecture. It allows per-cluster, per-component value overrides without modifying Git, the component catalog, or any template. Operators can change Helm values, replica counts, feature flags, and more — and Flux reconciles the change automatically.
How Patching Works
sequenceDiagram
participant Op as Operator
participant API as flux-resourceset API
participant DB as Data Store
participant Flux as Child Cluster (Flux)
Op->>API: PATCH cluster "us-east-prod-01"<br/>patches.grafana.replicaCount = "3"
API->>DB: Update cluster document
API-->>Op: 200 OK
Note over Flux: Next poll cycle
Flux->>API: GET /clusters/{dns}/platform-components
API->>DB: Read cluster + catalog
API->>API: Merge: inject patches.grafana into grafana input
API-->>Flux: {"inputs": [{..., "patches": {"replicaCount": "3"}}]}
Flux->>Flux: ResourceSet renders ConfigMap with replicaCount=3
Flux->>Flux: HelmRelease references ConfigMap via valuesFrom
Flux->>Flux: Helm upgrade applies new replica count
The Patches Object
Patches are stored in the cluster document, keyed by component ID:
{
"cluster_dns": "us-east-prod-01.k8s.example.com",
"patches": {
"grafana": {
"replicaCount": "3",
"persistence.storageClassName": "ssd"
},
"podinfo": {
"replicaCount": "2",
"ui.color": "#2f855a",
"ui.message": "Hello from patches"
},
"traefik": {
"deployment.replicas": "1",
"service.type": "ClusterIP"
}
}
}
Each key in a component’s patches maps to a Helm value path. Dotted keys (like ui.color) map to nested Helm values.
How Patches Become Helm Values
The ResourceSet template renders patches into a ConfigMap, then references it from the HelmRelease via valuesFrom:
flowchart TD
A["API Response<br/>patches: {replicaCount: '2', ui.color: '#2f855a'}"]
B["ConfigMap<br/>values-podinfo-cluster"]
C["HelmRelease<br/>platform-podinfo"]
D["Helm Chart<br/>podinfo"]
A -->|"ResourceSet renders"| B
B -->|"valuesFrom with targetPath"| C
C -->|"helm upgrade"| D
B -.- B1["data:<br/> replicaCount: '2'<br/> ui.color: '#2f855a'"]
C -.- C1["valuesFrom:<br/> - kind: ConfigMap<br/> valuesKey: replicaCount<br/> targetPath: replicaCount<br/> - kind: ConfigMap<br/> valuesKey: ui.color<br/> targetPath: ui.color"]
The targetPath in valuesFrom tells Helm where to inject the value in the chart’s values tree. This is a standard Flux HelmRelease feature — the innovation is that the values are computed from the API, not hardcoded in Git.
In the demo template, each generated values ConfigMap is labeled reconcile.fluxcd.io/watch: "Enabled" and each generated HelmRelease uses interval: 1m. This gives fast event-driven upgrades when values change, plus a short periodic poll interval.
Patching via CLI
The demo includes a CLI command to patch any component with dynamic key=value paths:
# Patch podinfo values on demo-cluster-01
./target/debug/flux-resourceset-cli demo patch-component demo-cluster-01 podinfo \
--set replicaCount=3 \
--set ui.message="Hello from CLI patch" \
--set ui.color="#3b82f6"
This updates the cluster document’s patches.podinfo object in the data store.
Patching Use Cases
| Use Case | Patch Example | Effect |
|---|---|---|
| Scale a component | {"replicaCount": "3"} | Component scales to 3 replicas |
| Change UI branding | {"ui.color": "#ff0000", "ui.message": "Maintenance"} | Application UI reflects new values |
| Environment-specific tuning | {"resources.limits.memory": "512Mi"} | Different resource limits per cluster |
| Feature flags | {"feature.newDashboard": "true"} | Enable features per cluster |
| Ingress configuration | {"ingress.className": "internal"} | Different ingress class per cluster |
Patching vs. Other Override Mechanisms
| Mechanism | Scope | Requires Git PR? | Use Case |
|---|---|---|---|
| Catalog defaults | All clusters using the component | Yes (schema change) | Global default values |
| OCI tag override | One cluster, one component | No (API call) | Hotfix or canary version |
| Component path override | One cluster, one component | No (API call) | Component version upgrade |
| Patches | One cluster, one component | No (API call) | Value tuning, feature flags, scaling |
| Template changes | All clusters (template is global) | Yes (Git PR) | Changing how resources are rendered |
Patches are the most granular override — they change individual Helm values without affecting any other cluster or component.
Verifying Patches
After patching, verify the change propagated:
# Reconcile quickly
flux reconcile helmrelease platform-podinfo -n flux-system --with-source
# Check reconcile result
kubectl get hr -n flux-system platform-podinfo \
-o jsonpath='ready={.status.conditions[?(@.type=="Ready")].status} reason={.status.conditions[?(@.type=="Ready")].reason} action={.status.lastAttemptedReleaseAction}{"\n"}'
# Check the actual deployment
kubectl get deploy -n podinfo podinfo \
-o jsonpath='replicas={.spec.replicas} color={.spec.template.spec.containers[0].env[?(@.name=="PODINFO_UI_COLOR")].value} message={.spec.template.spec.containers[0].env[?(@.name=="PODINFO_UI_MESSAGE")].value}{"\n"}'
# Check rendered values
kubectl get configmap -n flux-system values-podinfo-demo-cluster-01 \
-o jsonpath='replicas={.data.replicaCount} color={.data.ui\.color} message={.data.ui\.message}{"\n"}'
Multi-Cluster Management
This architecture is designed from the ground up for managing hundreds to thousands of Kubernetes clusters. The phone-home model, stateless API, and resource-driven data model all contribute to linear scaling without operational complexity growth.
Scaling Properties
graph TB
subgraph "Enterprise Fleet"
direction TB
DEV1["DEV Cluster 1"]
DEV2["DEV Cluster 2"]
DEV3["DEV Cluster ...N"]
QA1["QA Cluster 1"]
QA2["QA Cluster 2"]
UAT1["UAT Cluster 1"]
PROD1["PROD Cluster 1"]
PROD2["PROD Cluster 2"]
PROD3["PROD Cluster ...N"]
end
API["flux-resourceset API<br/>(stateless, multi-replica)"]
DEV1 & DEV2 & DEV3 -->|"poll"| API
QA1 & QA2 -->|"poll"| API
UAT1 -->|"poll"| API
PROD1 & PROD2 & PROD3 -->|"poll"| API
Why It Scales
| Property | How |
|---|---|
| Stateless API | No per-cluster state in the API process. Add replicas for HA, not for capacity. |
| Pull-based | Each cluster owns its own reconciliation loop. The API does not need to track cluster connectivity. |
| Minimal request cost | Each request = 1 data store read + 1 merge. Sub-millisecond response time. |
| Independent failures | One cluster’s provider failing does not affect any other cluster. |
| Linear polling load | 1,000 clusters polling 3 endpoints every 5 minutes = 10 req/sec. Trivial for any HTTP service. |
Fleet-Wide Operations
Rolling Out a New Component
When a new platform component needs to be deployed across the fleet:
flowchart TD
A["1. Add to component catalog<br/>(one API call)"] --> B["2. Add component_ref to<br/>target clusters<br/>(batch API calls)"]
B --> C["3. Clusters poll on schedule"]
C --> D["4. Each cluster independently<br/>installs the component"]
B -->|"DEV first"| C1["DEV clusters pick up change"]
B -->|"then QA"| C2["QA clusters pick up change"]
B -->|"then PROD"| C3["PROD clusters pick up change"]
You control rollout speed by controlling when you add the component_ref to each tier’s clusters. No pipeline orchestration — just API calls.
Upgrading a Component Version
To upgrade grafana from 17.0.0 to 17.1.0 across the fleet:
- Ensure the new version exists in the platform components OCI artifact
- Update the catalog’s
component_pathfromobservability/grafana/17.0.0toobservability/grafana/17.1.0 - All clusters using catalog defaults pick up the change on next poll
For canary rollouts, override specific clusters first:
{
"platform_components": [
{
"id": "grafana",
"component_path": "observability/grafana/17.1.0",
"oci_tag": "v1.1.0-rc1"
}
]
}
DEV gets the new version. PROD stays on the catalog default.
Hotfix Workflow
flowchart LR
A["CVE discovered<br/>in cert-manager"] --> B["Fix merged to repo<br/>OCI artifact v1.0.0-1 built"]
B --> C["Update affected clusters<br/>oci_tag: v1.0.0-1"]
C --> D["Clusters poll and<br/>reconcile the fix"]
C -->|"Only cert-manager<br/>is affected"| E["All other components<br/>stay on v1.0.0"]
Hotfixes are per-component, per-cluster. You update oci_tag on the specific component for the specific clusters that need the fix. No full release cycle required.
Environment Tiers
The architecture has first-class support for environment-based differentiation:
| Mechanism | How It Works |
|---|---|
cluster.environment | Each cluster document has an environment field (dev, qa, uat, prod). Included in every API response. |
cluster_env_enabled | When true on a catalog component, the ResourceSet template appends /{environment} to the component path. Different environment tiers get different Kustomize overlays. |
| Per-cluster patches | Different Helm values per cluster. PROD gets 5 replicas, DEV gets 1. |
| OCI tag overrides | DEV clusters can pin to release candidates while PROD stays on stable. |
Environment-Aware Path Resolution
When cluster_env_enabled is true:
Catalog component_path: core/cert-manager/1.14.0
Cluster environment: prod
→ Resolved path: core/cert-manager/1.14.0/prod
This enables the platform components repo to have environment-specific Kustomize overlays:
core/cert-manager/1.14.0/
├── base/
│ └── deployment.yaml
├── dev/
│ └── kustomization.yaml
├── qa/
│ └── kustomization.yaml
└── prod/
└── kustomization.yaml
Decommissioning a Cluster
flowchart LR
A["Delete cluster record<br/>from API"] --> B["All endpoints return<br/>empty inputs"]
B --> C["ResourceSets render<br/>empty resource list"]
C --> D["Flux garbage-collects<br/>all resources"]
No manual cleanup. No orphaned resources. The data model drives everything.
Enterprise Benefits Summary
| Benefit | Description |
|---|---|
| Single source of truth | One API holds the desired state for every cluster. No separate configuration management inventory, no spreadsheets, no wiki pages. |
| Cluster creation in minutes | Bootstrap cluster + phone home + reconcile. No weeks-long process involving manual playbooks and ticket queues. |
| Zero state divergence | API data = ResourceSet input = running cluster state. Drift is automatically corrected. |
| Operational velocity | Change a value via API → Flux reconciles. No PR, no review, no pipeline for operational changes. |
| Audit trail | Every API mutation is logged. Templates changes go through Git. Full traceability. |
| Team autonomy | Platform engineers own templates (Git). Platform operators own data (API). Flux owns reconciliation. |
| Failure isolation | Each cluster is independent. API outage = no new changes, not cluster outage. |
| Cost efficiency | Stateless API uses minimal resources. No management cluster scaling with fleet size. |
| Infrastructure-agnostic | Same model works on-prem, in the cloud, at the edge, or across hybrid environments. No vendor lock-in. |
Versioning & Hotfix Strategy
The platform components repo uses a versioning model with two independent axes of change: the OCI artifact tag and the component path within that artifact. This enables fine-grained control over what each cluster runs.
Two Axes of Version Control
graph TD
subgraph "OCI Artifact (tagged build of the repo)"
subgraph "v1.0.0"
A1["core/cert-manager/1.14.0/"]
A2["observability/grafana/17.0.0/"]
A3["networking/ingress-nginx/4.9.0/"]
end
end
subgraph "OCI Artifact (hotfix build)"
subgraph "v1.0.0-1"
B1["core/cert-manager/1.14.1/ ← fixed"]
B2["observability/grafana/17.0.0/"]
B3["networking/ingress-nginx/4.9.0/"]
end
end
| Axis | What It Controls | How It Changes |
|---|---|---|
| OCI tag | Which build of the monorepo artifact to pull | New tag on each merge to main (v1.0.0, v1.0.0-1, v1.1.0) |
| Component path | Which version directory within the artifact to use | Update component_path in the API (observability/grafana/17.0.0 → 17.1.0) |
Normal Release Flow
flowchart LR
A["All components at<br/>v1.0.0"] --> B["New feature merged<br/>to platform repo"]
B --> C["CI builds and tags<br/>v1.1.0"]
C --> D["Update catalog<br/>oci_tag: v1.1.0"]
D --> E["All clusters pull<br/>v1.1.0 on next poll"]
In a normal release, all components point to the same OCI tag. The catalog default is updated, and every cluster picks it up.
Hotfix Flow
flowchart LR
A["CVE in cert-manager<br/>All clusters on v1.0.0"] --> B["Fix merged to<br/>cert-manager/1.14.1/"]
B --> C["CI builds and tags<br/>v1.0.0-1"]
C --> D["Update cert-manager's<br/>oci_tag to v1.0.0-1<br/>component_path to<br/>cert-manager/1.14.1"]
D --> E["Only cert-manager<br/>updates on affected clusters"]
E --> F["All other components<br/>stay on v1.0.0"]
Hotfixes use SemVer pre-release suffixes: v1.0.0-1, v1.0.0-2. This keeps them:
- Sortable —
v1.0.0-1 < v1.0.0-2 < v1.1.0 - Tied to base release — clear which release they patch
- Temporary — the next full release collapses everything back to one tag
Per-Cluster Version Pinning
Any cluster can be pinned to a different version than the catalog default:
{
"platform_components": [
{
"id": "grafana",
"oci_tag": "v1.1.0-rc1",
"component_path": "observability/grafana/17.1.0"
}
]
}
Use cases:
- Canary testing — DEV cluster gets the release candidate
- Rollback — pin a PROD cluster to the previous version while investigating
- Gradual rollout — update clusters one tier at a time
Component Lifecycle
stateDiagram-v2
[*] --> CatalogEntry: Add to catalog
CatalogEntry --> AssignedToCluster: Add component_ref to cluster
AssignedToCluster --> Running: Flux reconciles
Running --> Upgraded: Update component_path/oci_tag
Upgraded --> Running: Flux reconciles new version
Running --> Hotfixed: Per-cluster oci_tag override
Hotfixed --> Running: Next full release
Running --> Disabled: Set enabled=false
Disabled --> Running: Set enabled=true
Running --> Removed: Remove component_ref
Removed --> GarbageCollected: Flux cleans up
GarbageCollected --> [*]
Platform Components Repo Structure
appteam-flux-repo/
├── COMPONENTS.yaml # Registry — CI-validated
├── core/
│ └── cert-manager/
│ ├── 1.14.0/
│ │ ├── base/ # Shared resources
│ │ ├── dev/
│ │ │ └── kustomization.yaml
│ │ ├── qa/
│ │ │ └── kustomization.yaml
│ │ └── prod/
│ │ └── kustomization.yaml
│ └── 1.14.1/ # Hotfix version
│ └── ...
├── observability/
│ └── grafana/
│ ├── 17.0.0/
│ │ └── ...
│ └── 17.1.0/ # Upgrade version
│ └── ...
└── networking/
└── ingress-nginx/
└── 4.9.0/
└── ...
Each environment directory must be buildable in isolation: kustomize build core/cert-manager/1.14.0/prod/ must succeed.
Version Cleanup
Keep N previous versions per component (recommended: 3). CI can prune older version directories. Old OCI tags remain in the registry for emergency rollbacks.
Security & Authentication
Security in this architecture operates at multiple layers: API authentication, cluster identity, network boundaries, and credential management.
Authentication Model
flowchart TD
subgraph "Child Cluster"
S["Secret: internal-api-token"]
P["ResourceSetInputProvider"]
S -->|"Bearer token"| P
end
subgraph "API Layer"
AUTH["Auth Middleware"]
API["flux-resourceset"]
AUTH -->|"validated"| API
end
P -->|"Authorization: Bearer <token>"| AUTH
Bearer Token Authentication
The API uses bearer token authentication. Tokens are configured via environment variables:
AUTH_TOKEN— required for all read endpointsCRUD_AUTH_TOKEN— required for write endpoints in CRUD mode (defaults toAUTH_TOKENif not set)
This separation allows:
- Read-only clusters to use a shared read token
- Operators/CI to use a separate write token
- Token rotation without affecting cluster polling (rotate read and write tokens independently)
Cluster-Side Token Storage
Each cluster stores the token in a Kubernetes Secret:
apiVersion: v1
kind: Secret
metadata:
name: internal-api-token
namespace: flux-system
type: Opaque
stringData:
token: "the-bearer-token"
This Secret is either:
- Pre-installed in the cluster’s bootstrap image or manifests
- Injected during cluster provisioning (via cloud-init, Terraform, Cluster API, or manual setup)
- Managed by an external secrets operator that fetches the token from a vault
Upgrading to mTLS
For stricter security requirements, the ResourceSetInputProvider supports TLS client certificates via certSecretRef:
spec:
type: ExternalService
url: https://internal-api.internal.example.com/api/v2/flux/...
certSecretRef:
name: api-client-cert
This eliminates shared bearer tokens in favor of per-cluster x.509 certificates. The API would need to be configured with a TLS server certificate and a CA trust chain.
Network Security
Recommended Network Boundaries
| Connection | Direction | Protocol | Authentication |
|---|---|---|---|
| Cluster → API | Outbound from cluster | HTTPS | Bearer token or mTLS |
| Operator → API (CRUD) | Inbound to CRUD instance | HTTPS | Bearer token (write) |
| API → Data Store | Local/internal | SQLite file access (or in-memory) | Filesystem permissions |
Network Policy Considerations
- The API does not need inbound access to clusters — it is purely pull-based
- Only the
flux-systemnamespace on each cluster needs outbound access to the API - CRUD endpoints should be restricted to operator networks or CI/CD runners
Cluster Identity
The cluster-identity ConfigMap is the root of trust for each cluster:
data:
CLUSTER_NAME: "us-east-prod-01"
CLUSTER_DNS: "us-east-prod-01.k8s.internal.example.com"
ENVIRONMENT: "prod"
INTERNAL_API_URL: "https://internal-api.internal.example.com"
This ConfigMap determines:
- Which API endpoint the cluster calls
- Which cluster DNS is used in the URL path (determines what data the cluster receives)
- What environment tier the cluster belongs to
The ConfigMap is injected during cluster provisioning and should be treated as immutable after bootstrap.
Data Access Control
The API enforces access control at the endpoint level:
| Endpoint | Token Required | Access Level |
|---|---|---|
/api/v2/flux/... | AUTH_TOKEN | Read-only — clusters can only read their own data via DNS path |
/clusters, /platform_components, etc. | CRUD_AUTH_TOKEN | Read-write — operators can modify any cluster |
/health, /ready | None | Public — Kubernetes probes |
/openapi.yaml | None | Public — API documentation |
Per-Cluster Data Isolation
Each cluster can only access its own data because the API path includes the cluster DNS:
GET /api/v2/flux/clusters/us-east-prod-01.k8s.example.com/platform-components
A cluster cannot query another cluster’s configuration without knowing (and requesting) a different DNS path. The bearer token does not provide cross-cluster access control — all clusters share the same read token. If per-cluster token isolation is required, implement it as an API middleware enhancement.
Secrets in the Data Model
The patches object supports arbitrary key-value pairs. Do not store sensitive values (passwords, API keys, private certificates) in patches. Instead:
- Use Kubernetes Secrets + ExternalSecrets Operator for sensitive values
- Use patches only for non-sensitive configuration (replica counts, feature flags, resource limits)
- For sensitive Helm values, use
valuesFromwith a Secret instead of a ConfigMap
Local Demo
This guide walks through running the full demo on a local kind cluster. By the end, you will have:
- A kind cluster with Flux Operator installed
- The flux-resourceset API deployed with seed data
- ResourceSetInputProviders polling the API
- ResourceSets rendering and reconciling platform components, namespaces, and rolebindings
Prerequisites
Required tools:
- Rust/Cargo — build the API and CLI
- Docker — container runtime for kind
- kind — local Kubernetes clusters
- kubectl — Kubernetes CLI
- flux CLI — manual reconcile commands (
flux reconcile ...) - curl — HTTP requests
Optional tools:
- jq — pretty JSON output
- Poetry + Python 3 — for
make generate(code generation only) - openapi-generator — for Rust model generation (code generation only)
One-Command Demo
cd flux-resourceset
make demo
This runs kind-create and kind-demo, which:
- Builds the Docker image (
flux-resourceset:local) - Creates a kind cluster named
flux-demo - Loads the image into the cluster
- Installs the Flux Operator from upstream
- Applies base Kubernetes manifests (FluxInstance, RBAC, services)
- Waits for Flux controllers to be ready
- Creates a seed data ConfigMap from
data/seed.json - Deploys the API (read-only + CRUD instances)
- Applies ResourceSetInputProviders
- Applies ResourceSets
What Gets Deployed
graph TB
subgraph "flux-system namespace"
API_R["flux-api-read<br/>(read-only mode)"]
API_C["flux-api-crud<br/>(CRUD mode)"]
SEED["ConfigMap: flux-api-seed-data"]
P1["Provider: platform-components"]
P2["Provider: namespaces"]
P3["Provider: rolebindings"]
RS1["ResourceSet: platform-components"]
RS2["ResourceSet: namespaces"]
RS3["ResourceSet: rolebindings"]
HR1["HelmRelease: platform-cert-manager"]
HR2["HelmRelease: platform-traefik"]
HR3["HelmRelease: platform-podinfo"]
end
subgraph "Created namespaces"
NS1["cert-manager"]
NS2["traefik"]
NS3["podinfo"]
end
SEED -->|"loaded at startup"| API_R
SEED -->|"loaded at startup"| API_C
P1 -->|"polls"| API_R
P2 -->|"polls"| API_R
P3 -->|"polls"| API_R
P1 --> RS1
P2 --> RS2
P3 --> RS3
RS1 -->|"renders"| HR1 & HR2 & HR3
RS2 -->|"renders"| NS1 & NS2 & NS3
Seed Data
The demo uses data/seed.json which contains:
One cluster: demo-cluster-01
- Environment:
dev - 3 platform components: cert-manager, traefik, podinfo
- 3 namespaces: cert-manager, traefik, podinfo
- 2 rolebindings: platform-admins (cluster-admin), dev-readers (view)
- Patches for podinfo (replica count, UI color, UI message) and traefik (replicas, service type)
Three catalog entries: cert-manager, traefik, podinfo — each pointing to public Helm chart repositories.
Checking Status
After make demo, verify everything is running:
# Check pods
kubectl get pods -n flux-system
# Check providers
kubectl get resourcesetinputproviders -n flux-system
# Check resourcesets
kubectl get resourcesets -n flux-system
# Check HelmReleases
kubectl get helmreleases -n flux-system
# Check created namespaces
kubectl get namespaces
# Check rolebindings
kubectl get clusterrolebindings platform-admins dev-readers
Running the CLI Demo
The automated CLI demo flow exercises the full lifecycle:
Step 1: Port-forward the API
make cli-demo-port-forward
This exposes the API on http://127.0.0.1:8080.
Step 2: Run the CLI demo
In another terminal:
make cli-demo
This:
- Builds the CLI
- Lists clusters and namespaces
- Adds a new namespace (
demo-runtime) via CLI - Forces reconciliation
- Waits for the namespace to be created
- Verifies the namespace exists
Step 3: Manual CLI exploration
export FLUX_API_URL=http://127.0.0.1:8080
export FLUX_API_TOKEN="$(kubectl -n flux-system get secret internal-api-token \
-o jsonpath='{.data.token}' | base64 -d)"
export FLUX_API_WRITE_TOKEN="$FLUX_API_TOKEN"
# List clusters
./target/debug/flux-resourceset-cli cluster list | jq .
# List namespaces
./target/debug/flux-resourceset-cli namespace list | jq .
# Get Flux-formatted platform components
curl -s -H "Authorization: Bearer $FLUX_API_TOKEN" \
http://127.0.0.1:8080/api/v2/flux/clusters/demo-cluster-01.k8s.example.com/platform-components | jq .
Podinfo Patch Demo
This demonstrates dynamic patching — changing Helm values via the API and watching Flux reconcile:
# 1. Check current state
kubectl get configmap -n flux-system values-podinfo-demo-cluster-01 \
-o jsonpath='replicas={.data.replicaCount} color={.data.ui\.color} message={.data.ui\.message}{"\n"}'
kubectl get deploy -n podinfo podinfo \
-o jsonpath='replicas={.spec.replicas} color={.spec.template.spec.containers[0].env[?(@.name=="PODINFO_UI_COLOR")].value} message={.spec.template.spec.containers[0].env[?(@.name=="PODINFO_UI_MESSAGE")].value}{"\n"}'
# 2. Patch via CLI
./target/debug/flux-resourceset-cli demo patch-component demo-cluster-01 podinfo \
--set replicaCount=3 \
--set ui.message="Hello from CLI patch" \
--set ui.color="#3b82f6" | jq .
# 3. Force reconcile inputs/templates
kubectl annotate resourcesetinputprovider platform-components -n flux-system \
fluxcd.controlplane.io/requestedAt="$(date -u +"%Y-%m-%dT%H:%M:%SZ")" --overwrite
kubectl annotate resourceset platform-components -n flux-system \
fluxcd.controlplane.io/requestedAt="$(date -u +"%Y-%m-%dT%H:%M:%SZ")" --overwrite
# 4. Trigger immediate Helm reconcile
flux reconcile helmrelease platform-podinfo -n flux-system --with-source
# 5. Verify
kubectl get hr -n flux-system platform-podinfo \
-o jsonpath='ready={.status.conditions[?(@.type=="Ready")].status} reason={.status.conditions[?(@.type=="Ready")].reason} action={.status.lastAttemptedReleaseAction}{"\n"}'
kubectl get configmap -n flux-system values-podinfo-demo-cluster-01 \
-o jsonpath='replicas={.data.replicaCount} color={.data.ui\.color} message={.data.ui\.message}{"\n"}'
kubectl get deploy -n podinfo podinfo \
-o jsonpath='replicas={.spec.replicas} color={.spec.template.spec.containers[0].env[?(@.name=="PODINFO_UI_COLOR")].value} message={.spec.template.spec.containers[0].env[?(@.name=="PODINFO_UI_MESSAGE")].value}{"\n"}'
# 6. Optional: check the UI
kubectl -n podinfo port-forward svc/podinfo 9898:9898
# Open http://127.0.0.1:9898
Cleanup
make kind-delete
# or
make clean
This deletes the kind cluster and all associated resources.
CLI Usage
flux-resourceset-cli is a command-line tool for interacting with the CRUD API. It is built from the same codebase and generated from the same Firestone schemas as the API.
Building
cd flux-resourceset
cargo build --bin flux-resourceset-cli
The binary is at target/debug/flux-resourceset-cli.
Environment Variables
| Variable | Required | Description |
|---|---|---|
FLUX_API_URL | yes | API base URL (e.g., http://127.0.0.1:8080) |
FLUX_API_TOKEN | yes | Bearer token for read operations |
FLUX_API_WRITE_TOKEN | yes | Bearer token for write operations |
Setup from Demo Cluster
export FLUX_API_URL=http://127.0.0.1:8080
export FLUX_API_TOKEN="$(kubectl -n flux-system get secret internal-api-token \
-o jsonpath='{.data.token}' | base64 -d)"
export FLUX_API_WRITE_TOKEN="$FLUX_API_TOKEN"
Commands
Cluster Operations
# List all clusters
flux-resourceset-cli cluster list
# Get a specific cluster
flux-resourceset-cli cluster get demo-cluster-01
Namespace Operations
# List all namespaces
flux-resourceset-cli namespace list
# Get a specific namespace
flux-resourceset-cli namespace get cert-manager
# Create namespace record and attach reference to a cluster
flux-resourceset-cli namespace create team-sandbox --cluster demo-cluster-01 \
--label team=sandbox --annotation owner=platform
# Attach/detach an existing namespace record
flux-resourceset-cli namespace assign team-sandbox --cluster demo-cluster-01
flux-resourceset-cli namespace unassign team-sandbox --cluster demo-cluster-01
Platform Component Operations
# List all catalog components
flux-resourceset-cli component list
# Get a specific component
flux-resourceset-cli component get cert-manager
# Create/ensure catalog component, then attach to cluster
flux-resourceset-cli component create cert-manager \
--component-path core/cert-manager/1.14.0 \
--component-version 1.14.0 \
--oci-url oci://registry.example/platform-components \
--oci-tag v1.0.0 \
--cluster demo-cluster-01
# Attach/detach existing component references
flux-resourceset-cli component assign cert-manager --cluster demo-cluster-01
flux-resourceset-cli component unassign cert-manager --cluster demo-cluster-01
# Patch per-cluster component values
flux-resourceset-cli component patch podinfo --cluster demo-cluster-01 --set replicaCount=3
Demo Commands
The CLI includes demo-specific commands for common workflows:
# Add a namespace to a cluster
flux-resourceset-cli demo add-namespace <cluster-id> <namespace> \
--label team=platform \
--annotation owner=you
# Patch one component using dynamic key/value paths
flux-resourceset-cli demo patch-component <cluster-id> <component-id> \
--set replicaCount=3 \
--set ui.message="Hello" \
--set ui.color="#3b82f6"
# Get Flux-formatted namespace response
flux-resourceset-cli demo flux-namespaces <cluster-dns>
Output
All CLI commands output JSON. Pipe to jq for pretty formatting:
flux-resourceset-cli cluster list | jq .
Workflow Examples
Add a namespace and watch Flux create it
# 1. Create namespace + attach reference
flux-resourceset-cli namespace create team-sandbox --cluster demo-cluster-01 \
--label team=sandbox --annotation owner=platform
# 2. Force reconcile
kubectl annotate resourcesetinputprovider namespaces -n flux-system \
fluxcd.controlplane.io/requestedAt="$(date -u +"%Y-%m-%dT%H:%M:%SZ")" --overwrite
kubectl annotate resourceset namespaces -n flux-system \
fluxcd.controlplane.io/requestedAt="$(date -u +"%Y-%m-%dT%H:%M:%SZ")" --overwrite
# 3. Wait and verify
kubectl get ns team-sandbox
Patch a component and verify
# 1. Patch
flux-resourceset-cli demo patch-component demo-cluster-01 podinfo --set replicaCount=5
# 2. Refresh provider + resourceset
kubectl annotate resourcesetinputprovider platform-components -n flux-system \
fluxcd.controlplane.io/requestedAt="$(date -u +"%Y-%m-%dT%H:%M:%SZ")" --overwrite
kubectl annotate resourceset platform-components -n flux-system \
fluxcd.controlplane.io/requestedAt="$(date -u +"%Y-%m-%dT%H:%M:%SZ")" --overwrite
# 3. Trigger immediate Helm upgrade
flux reconcile helmrelease platform-podinfo -n flux-system --with-source
# 4. Verify
kubectl get deploy -n podinfo podinfo \
-o jsonpath='replicas={.spec.replicas} color={.spec.template.spec.containers[0].env[?(@.name=="PODINFO_UI_COLOR")].value} message={.spec.template.spec.containers[0].env[?(@.name=="PODINFO_UI_MESSAGE")].value}{"\n"}'
Extending with New Resource Types
The architecture is designed to be extended with new resource types beyond the initial three (platform-components, namespaces, rolebindings). Adding a new resource type follows a consistent pattern.
The Pattern
Every resource type requires four pieces:
flowchart TD
A["1. Data Schema<br/>(Firestone resource definition)"] --> B["2. API Endpoint<br/>(returns {inputs: [...]})"]
B --> C["3. ResourceSetInputProvider<br/>(calls the endpoint)"]
C --> D["4. ResourceSet Template<br/>(renders Kubernetes resources)"]
Step-by-Step: Adding Network Policies
Let’s walk through adding a network-policies resource type.
Step 1: Define the Firestone Schema
Create resources/network_policy.yaml:
kind: network_policy
apiVersion: v1
schema:
type: object
required: [id, target_namespace, ingress_rules]
properties:
id:
type: string
example: allow-monitoring
target_namespace:
type: string
example: monitoring
ingress_rules:
type: array
items:
type: object
properties:
from_namespace:
type: string
port:
type: integer
Step 2: Add to the Cluster Schema
In resources/cluster.yaml, add a network_policies array:
network_policies:
type: array
items:
$ref: "#/components/schemas/network_policy_ref"
description: Network policies to sync to this cluster.
Step 3: Regenerate Code
make generate
This updates the OpenAPI spec, Rust models, and CLI modules.
Step 4: Implement the API Endpoint
Add GET /api/v2/flux/clusters/{cluster_dns}/network-policies that returns:
{
"inputs": [
{
"id": "allow-monitoring",
"target_namespace": "monitoring",
"ingress_rules": [
{ "from_namespace": "prometheus", "port": 9090 }
],
"cluster": {
"name": "us-east-prod-01",
"dns": "us-east-prod-01.k8s.example.com",
"environment": "prod"
}
}
]
}
Step 5: Create the ResourceSetInputProvider
apiVersion: fluxcd.controlplane.io/v1
kind: ResourceSetInputProvider
metadata:
name: network-policies
namespace: flux-system
annotations:
fluxcd.controlplane.io/reconcileEvery: "5m"
spec:
type: ExternalService
url: "${INTERNAL_API_URL}/api/v2/flux/clusters/${CLUSTER_DNS}/network-policies"
secretRef:
name: internal-api-token
Step 6: Create the ResourceSet Template
apiVersion: fluxcd.controlplane.io/v1
kind: ResourceSet
metadata:
name: network-policies
namespace: flux-system
spec:
inputsFrom:
- name: network-policies
resourcesTemplate: |
---
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: << inputs.id >>
namespace: << inputs.target_namespace >>
spec:
podSelector: {}
policyTypes:
- Ingress
ingress:
<<- range $rule := inputs.ingress_rules >>
- from:
- namespaceSelector:
matchLabels:
kubernetes.io/metadata.name: << $rule.from_namespace >>
ports:
- port: << $rule.port >>
<<- end >>
Step 7: Deploy
Add the provider and ResourceSet to the bootstrap manifests (for new clusters) and apply them to existing clusters.
What Makes This Extensible
| Aspect | How It Helps |
|---|---|
| Consistent contract | Every resource type uses {"inputs": [...]} — same provider, same pattern |
| Independent providers | Each resource type polls independently — no coupling |
| Schema-driven | Firestone generates models, OpenAPI, and CLI for new types automatically |
| Template isolation | Each ResourceSet template handles one type — no monolithic templates |
Ideas for Additional Resource Types
| Resource Type | Kubernetes Resources | Use Case |
|---|---|---|
| Network Policies | NetworkPolicy | Per-cluster network segmentation |
| Resource Quotas | ResourceQuota, LimitRange | Namespace resource limits |
| Secrets | ExternalSecret (ESO) | Centralized secret management |
| Ingress Routes | Ingress, IngressRoute | Per-cluster routing rules |
| Custom CRDs | Any custom resource | Organization-specific resources |
Each follows the same four-piece pattern: schema, endpoint, provider, template.
Frequently Asked Questions
Architecture & Design Decisions
Why an API instead of direct Kubernetes API access?
A common reaction is: “Why not just give operators kubectl access or build tooling that talks directly to the Kubernetes API on each cluster?”
The answer comes down to control, safety, and scale:
| Concern | Direct Kubernetes API | Purpose-built API (this) |
|---|---|---|
| Blast radius | One bad kubectl apply can break a cluster. Operators need kubeconfig access to every cluster. | All changes flow through a single API with validation. No direct cluster access needed for platform operations. |
| Business logic | The Kubernetes API has no concept of “platform components,” “environment tiers,” or “component catalogs.” You build that logic into scripts. | The API encodes your organization’s domain model. Merge logic, catalog defaults, environment resolution, and patching rules are built in. |
| Audit trail | Kubernetes audit logs are per-cluster and verbose. Correlating “who changed what across 200 clusters” is painful. | One API, one audit log. Every mutation is traceable to a user, timestamp, and change payload. |
| Integration | Integrating CI/CD, chatops, ticketing, or approval workflows with raw Kubernetes APIs across many clusters requires custom glue per cluster. | One REST API to integrate with. Webhooks, CI pipelines, Slack bots, and approval systems all talk to one endpoint. |
| Credential management | Operators (or CI) need kubeconfigs for every cluster. Rotating credentials means touching every cluster. | Operators need one API token. Clusters hold one read token. Token rotation is centralized. |
| Consistency | Without enforcement, two operators can configure the same component differently on two clusters. Scripts drift. | The catalog + merge model guarantees consistent computed state. Per-cluster differences are explicit and auditable. |
| Rollback | Rolling back a kubectl apply requires knowing exactly what was applied and in what order. | Revert the API data. Next poll cycle, Flux reconciles back. |
In short: The Kubernetes API is a powerful infrastructure primitive, but it is not a platform management API. This service adds the domain logic, guardrails, and integration surface that enterprise operations require.
Is this actually GitOps?
Yes — with a nuance. This is a GitOps-based model that adds an API-driven data layer.
The GitOps principles are preserved:
- Declarative — desired state is declared in structured data (API) and templates (Git)
- Versioned and immutable — templates are version-controlled in Git. API data changes are auditable and reversible.
- Pulled automatically — clusters pull their state; no manual push required
- Continuously reconciled — Flux detects and corrects drift automatically
What the API adds:
- Dynamic data — instead of static YAML files per cluster, the API computes each cluster’s state from catalog + overrides
- Operational velocity — data changes (scaling, patching, enabling/disabling) do not require Git PRs
- Business logic — merge rules, catalog defaults, and environment resolution happen in the API, not in Git overlays
The templates that govern how resources are deployed still live in Git and go through standard review. The API controls what is deployed where — the operational data plane.
Why not ArgoCD ApplicationSets?
ArgoCD ApplicationSets solve a similar problem (managing resources across many clusters) but take a fundamentally different approach:
| Aspect | ArgoCD ApplicationSets | This architecture |
|---|---|---|
| Model | Push from management cluster | Pull from each cluster |
| Management cluster dependency | Required — ArgoCD must maintain connections to all clusters | Not required for platform management — clusters are autonomous |
| Failure mode | Management cluster down = no reconciliation anywhere | API down = clusters keep running, just cannot get updates |
| Kubeconfig management | ArgoCD needs kubeconfigs for every target cluster | Each cluster holds one API bearer token |
| Network direction | Management cluster → target clusters (requires inbound access to clusters) | Target clusters → API (outbound only) |
| Data source | Git repos with generators (list, cluster, git, matrix) | API with merge logic and dynamic catalog |
| Per-cluster overrides | Generators + overlays (can get complex) | First-class patches object in the API |
Both are valid approaches. ApplicationSets work well when you have a stable management cluster with reliable connectivity to all targets. The phone-home model works better when clusters are distributed, network connectivity is unreliable, or you need clusters to be autonomous.
Does this work on-premises?
Yes. The architecture is infrastructure-agnostic. It has no dependency on any specific cloud provider, VM provisioner, or Kubernetes distribution.
| Environment | Requirements |
|---|---|
| On-prem bare metal | Kubernetes cluster with Flux Operator installed. Outbound HTTPS to the API. |
| On-prem VMs | Same — any hypervisor (VMware, KVM, Hyper-V). |
| Public cloud (EKS, AKS, GKE) | Deploy Flux Operator as a Helm chart or add-on. |
| Edge / remote sites | Lightweight K8s (k3s, k0s, MicroK8s). Can work over VPN or direct internet. |
| Air-gapped | Possible with a local API mirror and OCI registry mirror inside the air gap. |
| Hybrid | Mix any of the above. Every cluster phones home to the same API. |
The provisioning tooling is completely decoupled. Whether you use Terraform, Cluster API, Crossplane, Rancher, manual scripts, or your own management cluster — once Flux is running and the cluster-identity ConfigMap exists, the phone-home loop works.
Why separate read-only and CRUD modes?
The two modes serve fundamentally different access patterns:
| Mode | Consumers | Pattern | Scaling |
|---|---|---|---|
read-only | Hundreds/thousands of clusters polling | High concurrency, small payloads, predictable load | Multi-replica, horizontal scaling |
crud | Operators, CLI, CI/CD pipelines | Low concurrency, larger payloads, bursty | Single replica or small deployment |
Separating them gives you:
- Independent scaling — read replicas scale with fleet size; CRUD does not need to
- Security boundary — read-only instances never accept writes; separate tokens for each
- Blast radius — a CRUD deployment issue does not affect cluster polling
- Simpler operations — read-only instances are stateless and disposable
Operational Questions
What happens if the API goes down?
Clusters keep running. They continue reconciling from their last-known state. Existing HelmReleases, Namespaces, and ClusterRoleBindings all remain in place and healthy.
What stops working:
- New configuration changes are not picked up until the API recovers
- The ResourceSetInputProvider status shows not-ready
- Alerts should fire based on provider status conditions
This is a key advantage over push-based models — API downtime is an inconvenience, not an outage.
How do I roll back a bad change?
- Revert the API data — update the cluster document or catalog entry back to the previous state
- Wait for next poll — or force an immediate reconcile with
kubectl annotate - Flux reconciles — the ResourceSet re-renders with the reverted data, and Flux applies the diff
For template changes (in Git), use standard Git revert workflows. Flux picks up the reverted template on next reconcile.
How do I handle secrets?
The patches object is for non-sensitive configuration only (replica counts, feature flags, resource limits). For secrets:
- Use the External Secrets Operator to sync secrets from a vault (HashiCorp Vault, AWS Secrets Manager, Azure Key Vault, etc.)
- Reference Kubernetes Secrets in HelmRelease
valuesFrominstead of ConfigMaps - Add an
external-secretsresource type to the API to manage ESOExternalSecretresources via the same phone-home pattern
Can I use this with existing Flux installations?
Yes. The ResourceSetInputProvider and ResourceSet are standard Flux Operator CRDs. They coexist with existing GitRepositories, HelmRepositories, Kustomizations, and HelmReleases.
You can adopt incrementally:
- Install the Flux Operator alongside existing Flux controllers
- Deploy providers and ResourceSets for one resource type (e.g., namespaces)
- Migrate additional resource types as confidence grows
- Existing Git-based Flux resources continue working unchanged
How does this compare to Helm value files per cluster?
| Aspect | Helm values per cluster | API-driven patching |
|---|---|---|
| Storage | YAML files in Git (one per cluster, or overlays) | Structured data in the API |
| Updating 100 clusters | 100 file edits + PR | Batch API call |
| Per-cluster customization | Overlay hierarchy (can get deeply nested) | Flat patches object per cluster per component |
| Dynamic values | Requires scripted Git commits | API call → next poll → reconciled |
| Review requirement | Git PR for every change (even scaling) | API auth for data changes; Git PR for template changes |
| Merge conflicts | Possible with concurrent PRs | Not possible — API handles concurrency |
Can I extend this beyond platform components?
Yes. The architecture is designed for it. Any Kubernetes resource type can be managed this way. See the Extending chapter for a step-by-step walkthrough.
Ideas that organizations have considered:
- Network policies
- Resource quotas and limit ranges
- External secrets
- Ingress routes and TLS certificates
- Custom CRDs specific to the organization
- Monitoring and alerting configurations (PrometheusRule, ServiceMonitor)
Each follows the same pattern: schema, endpoint, provider, template.
Performance & Scale
How many clusters can this support?
The API is stateless and the per-request cost is minimal (one data store read + one merge). Rough numbers:
| Clusters | Resource Types | Poll Interval | Requests/sec |
|---|---|---|---|
| 100 | 3 | 5 min | 1 |
| 500 | 3 | 5 min | 5 |
| 1,000 | 3 | 5 min | 10 |
| 5,000 | 3 | 5 min | 50 |
| 10,000 | 5 | 5 min | 167 |
Even at 10,000 clusters with 5 resource types, the load is ~167 req/sec — well within the capacity of a small API deployment. Add read replicas for HA, not for throughput.
What is the latency from API change to cluster reconciliation?
It depends on the poll interval configured on the ResourceSetInputProvider. The default is 5 minutes. For faster feedback:
- Set
fluxcd.controlplane.io/reconcileEvery: "30s"on the provider (the demo uses this) - Force immediate reconciliation by annotating the provider with
fluxcd.controlplane.io/requestedAt - In practice, 5-minute intervals are fine for production — platform component changes are not latency-sensitive
Does every cluster get the full catalog?
No. Each cluster only receives the components, namespaces, and rolebindings assigned to it in the cluster document. The API computes a cluster-specific response — a cluster with 5 components gets 5 inputs, not the entire catalog.