Grafana
Grafana is an open-source platform for monitoring and observability, specializing in data visualization. It allows users to create, explore, and share dashboards with real-time metrics, logs, and traces.
Role in the LGTM Stack
Grafana is a core component of the LGTM stack (Loki, Grafana, Tempo, Mimir), which is a modern observability suite:
- Loki: Log aggregation and querying
- Grafana: Visualization and dashboarding
- Tempo: Distributed tracing
- Mimir: Long-term storage and scalable metrics
Grafana acts as the central interface, enabling users to query and visualize data from Loki, Tempo, and Mimir in a single pane of glass.
Grafana Operator
The Grafana Operator is a Kubernetes-native way to deploy and manage Grafana instances, dashboards, and datasources. It automates the lifecycle of Grafana resources, ensuring consistency and reproducibility.
Key Features
- Declarative Management: Grafana instances, dashboards, and datasources are defined as Kubernetes Custom Resources (CRs).
- GitOps-Friendly: Configuration is stored as code
- Automated Reconciliation: The operator ensures the actual state matches the desired state defined in the CRs.
Within the HavenPlus stack, by default we let the Grafana Operator deploy the following:
- A Grafana instance in the
grafananamespace - A Loki, Tempo & Mimir datasource in the
grafananamespace - A set of Kubernetes dashboards resources in the
grafananamespace
All these resources can be found in infrastructure/grafana/config/base, with required overlays in /infrastructure/grafana/config/overlays/.
Furthermore, the Grafana Operator itself lives in the grafana-operator namespace.
Single Sign-On with Keycloak
Grafana is preconfigured to delegate authentication to the cluster's own Keycloak instance using OIDC.
Front-channel and back-channel URLs
OIDC authentication splits into two distinct traffic paths. Configuring them correctly is the most important part of cluster setup:
| Channel | What | Who calls it | Configured as |
|---|---|---|---|
| Front-channel | /auth — the authorization endpoint the browser is redirected to | The user's browser | auth_url |
| Back-channel | /token and /userinfo — the endpoints Grafana calls after the browser returns | Grafana pod (server-side) | token_url, api_url |
The front-channel URL must use the public Keycloak hostname (e.g. https://keycloak.<cluster>.example), because the browser is the one resolving it — it has no awareness of the cluster's internal DNS.
The back-channel URLs should point at the in-cluster Kubernetes Service (http://keycloak-service.keycloak-instances:8080). This keeps server-to-server token exchange inside the cluster: no LoadBalancer egress, no ingress hop, lower latency, smaller attack surface.
For this split to work, the Keycloak side must enable hostname.backchannelDynamic: true — see the Keycloak page for the matching Keycloak config. With both halves in place, the iss claim Keycloak puts into tokens stays canonical (the public hostname) regardless of which path the back-channel call took.
Configuring SSO per cluster
The base Grafana CR in infrastructure/grafana/config/base/grafana.yaml already contains the full OIDC config skeleton with set-in-overlay placeholders. Each cluster overlay fills in the cluster-specific URLs:
# infrastructure/grafana/config/overlays/<cluster>/patches/grafana.yaml
spec:
config:
server:
root_url: "https://grafana.<cluster>.example"
auth.generic_oauth:
# Front-channel: browser-facing public hostname.
auth_url: "https://keycloak.<cluster>.example/realms/havenplus/protocol/openid-connect/auth"
# Back-channel: in-cluster Service DNS.
token_url: "http://keycloak-service.keycloak-instances:8080/realms/havenplus/protocol/openid-connect/token"
api_url: "http://keycloak-service.keycloak-instances:8080/realms/havenplus/protocol/openid-connect/userinfo"
httpRoute:
spec:
hostnames: [ "grafana.<cluster>.example" ]
The matching Grafana client redirectUris[0] must be patched in infrastructure/keycloak/instances/overlays/<cluster>/kustomization.yaml to https://grafana.<cluster>.example/login/generic_oauth.
Group → role mapping
Roles are derived from the user's Keycloak group membership:
role_attribute_path: >-
contains(groups[*], 'k8s-admins') && 'GrafanaAdmin' ||
contains(groups[*], 'k8s-developers') && 'Editor' || 'Viewer'
- Members of the
k8s-adminsrealm group are mapped toGrafanaAdmin. - Members of
k8s-developersare mapped toEditor. - Anyone else logging in successfully will be a
Viewer.
To grant a Keycloak user access at a specific level, add them to the matching realm group in the havenplus realm.
Granting a user access to Grafana
Once SSO is wired up via GitOps, granting a person access is a Keycloak admin task. Every step is performed in the Keycloak Admin Console:
-
Open the Keycloak Admin Console at
https://keycloak.<cluster>.exampleand log in astemp-admin(the default admin user created by the Keycloak Operator). The password lives in thekeycloakSealed Secret underinfrastructure/keycloak/instances/overlays/<cluster>/. -
Switch to the
havenplusrealm using the realm selector in the top-left corner of the console. -
Create the user. Go to Users → Add user.
-
Set the password. On the new user's page, open the Credentials tab → Set password.
-
Assign the user to a group. Open the Groups tab → Join Group, then pick one of:
k8s-admins— GrafanaGrafanaAdmin.k8s-developers— GrafanaEditor.- (no group) — Grafana
Viewer.
These groups are imported declaratively by
realm.yaml. If they're missing, theKeycloakRealmImportdid not run successfully — see Realm import limitations. -
Verify the login. In an incognito window, visit
https://grafana.<cluster>.example, click Sign in with Keycloak, and log in as the new user. You should land on Grafana's home dashboard with the assigned role visible under Profile → Preferences. If you instead seeUser not allowedorLogin failed, check thatusers.allow_sign_upis"true"in the overlay (see below) or pre-create a matching Grafana user.
Notes on allow_sign_up
The base config sets users.allow_sign_up: "false" — Grafana will not auto-create a Grafana user record on first SSO. Users must either be pre-created in Grafana, or allow_sign_up must be set to "true" in the overlay. On real clusters, prefer pre-creation (auditability); on local Kind development clusters, flipping allow_sign_up to "true" is convenient.
Local development notes
The local overlay (Kind) needs two settings that the production overlays don't:
security.cookie_secure: "false"andsecurity.cookie_samesite: "lax"— required because local serves over HTTP andgrafana.local/keycloak.localare distinct sites.SameSite=Strictwould block the OIDC state cookie on the cross-site return-trip.- An emergency
admin/adminfallback insecurity.admin_user/security.admin_password, so the cluster is still usable if Keycloak is down during development.