Skip to main content

Grafana

Grafana is an open-source platform for monitoring and observability, specializing in data visualization. It allows users to create, explore, and share dashboards with real-time metrics, logs, and traces.

Role in the LGTM Stack

Grafana is a core component of the LGTM stack (Loki, Grafana, Tempo, Mimir), which is a modern observability suite:

  • Loki: Log aggregation and querying
  • Grafana: Visualization and dashboarding
  • Tempo: Distributed tracing
  • Mimir: Long-term storage and scalable metrics

Grafana acts as the central interface, enabling users to query and visualize data from Loki, Tempo, and Mimir in a single pane of glass.

Grafana Operator

The Grafana Operator is a Kubernetes-native way to deploy and manage Grafana instances, dashboards, and datasources. It automates the lifecycle of Grafana resources, ensuring consistency and reproducibility.

Key Features

  • Declarative Management: Grafana instances, dashboards, and datasources are defined as Kubernetes Custom Resources (CRs).
  • GitOps-Friendly: Configuration is stored as code
  • Automated Reconciliation: The operator ensures the actual state matches the desired state defined in the CRs.

Within the HavenPlus stack, by default we let the Grafana Operator deploy the following:

  • A Grafana instance in the grafana namespace
  • A Loki, Tempo & Mimir datasource in the grafana namespace
  • A set of Kubernetes dashboards resources in the grafana namespace

All these resources can be found in infrastructure/grafana/config/base, with required overlays in /infrastructure/grafana/config/overlays/.

Furthermore, the Grafana Operator itself lives in the grafana-operator namespace.

Single Sign-On with Keycloak

Grafana is preconfigured to delegate authentication to the cluster's own Keycloak instance using OIDC.

Front-channel and back-channel URLs

OIDC authentication splits into two distinct traffic paths. Configuring them correctly is the most important part of cluster setup:

ChannelWhatWho calls itConfigured as
Front-channel/auth — the authorization endpoint the browser is redirected toThe user's browserauth_url
Back-channel/token and /userinfo — the endpoints Grafana calls after the browser returnsGrafana pod (server-side)token_url, api_url

The front-channel URL must use the public Keycloak hostname (e.g. https://keycloak.<cluster>.example), because the browser is the one resolving it — it has no awareness of the cluster's internal DNS.

The back-channel URLs should point at the in-cluster Kubernetes Service (http://keycloak-service.keycloak-instances:8080). This keeps server-to-server token exchange inside the cluster: no LoadBalancer egress, no ingress hop, lower latency, smaller attack surface.

For this split to work, the Keycloak side must enable hostname.backchannelDynamic: true — see the Keycloak page for the matching Keycloak config. With both halves in place, the iss claim Keycloak puts into tokens stays canonical (the public hostname) regardless of which path the back-channel call took.

Configuring SSO per cluster

The base Grafana CR in infrastructure/grafana/config/base/grafana.yaml already contains the full OIDC config skeleton with set-in-overlay placeholders. Each cluster overlay fills in the cluster-specific URLs:

# infrastructure/grafana/config/overlays/<cluster>/patches/grafana.yaml
spec:
config:
server:
root_url: "https://grafana.<cluster>.example"
auth.generic_oauth:
# Front-channel: browser-facing public hostname.
auth_url: "https://keycloak.<cluster>.example/realms/havenplus/protocol/openid-connect/auth"
# Back-channel: in-cluster Service DNS.
token_url: "http://keycloak-service.keycloak-instances:8080/realms/havenplus/protocol/openid-connect/token"
api_url: "http://keycloak-service.keycloak-instances:8080/realms/havenplus/protocol/openid-connect/userinfo"
httpRoute:
spec:
hostnames: [ "grafana.<cluster>.example" ]

The matching Grafana client redirectUris[0] must be patched in infrastructure/keycloak/instances/overlays/<cluster>/kustomization.yaml to https://grafana.<cluster>.example/login/generic_oauth.

Group → role mapping

Roles are derived from the user's Keycloak group membership:

role_attribute_path: >-
contains(groups[*], 'k8s-admins') && 'GrafanaAdmin' ||
contains(groups[*], 'k8s-developers') && 'Editor' || 'Viewer'
  • Members of the k8s-admins realm group are mapped to GrafanaAdmin.
  • Members of k8s-developers are mapped to Editor.
  • Anyone else logging in successfully will be a Viewer.

To grant a Keycloak user access at a specific level, add them to the matching realm group in the havenplus realm.

Granting a user access to Grafana

Once SSO is wired up via GitOps, granting a person access is a Keycloak admin task. Every step is performed in the Keycloak Admin Console:

  1. Open the Keycloak Admin Console at https://keycloak.<cluster>.example and log in as temp-admin (the default admin user created by the Keycloak Operator). The password lives in the keycloak Sealed Secret under infrastructure/keycloak/instances/overlays/<cluster>/.

  2. Switch to the havenplus realm using the realm selector in the top-left corner of the console.

  3. Create the user. Go to Users → Add user.

  4. Set the password. On the new user's page, open the Credentials tab → Set password.

  5. Assign the user to a group. Open the Groups tab → Join Group, then pick one of:

    • k8s-admins — Grafana GrafanaAdmin.
    • k8s-developers — Grafana Editor.
    • (no group) — Grafana Viewer.

    These groups are imported declaratively by realm.yaml. If they're missing, the KeycloakRealmImport did not run successfully — see Realm import limitations.

  6. Verify the login. In an incognito window, visit https://grafana.<cluster>.example, click Sign in with Keycloak, and log in as the new user. You should land on Grafana's home dashboard with the assigned role visible under Profile → Preferences. If you instead see User not allowed or Login failed, check that users.allow_sign_up is "true" in the overlay (see below) or pre-create a matching Grafana user.

Notes on allow_sign_up

The base config sets users.allow_sign_up: "false" — Grafana will not auto-create a Grafana user record on first SSO. Users must either be pre-created in Grafana, or allow_sign_up must be set to "true" in the overlay. On real clusters, prefer pre-creation (auditability); on local Kind development clusters, flipping allow_sign_up to "true" is convenient.

Local development notes

The local overlay (Kind) needs two settings that the production overlays don't:

  • security.cookie_secure: "false" and security.cookie_samesite: "lax" — required because local serves over HTTP and grafana.local / keycloak.local are distinct sites. SameSite=Strict would block the OIDC state cookie on the cross-site return-trip.
  • An emergency admin/admin fallback in security.admin_user / security.admin_password, so the cluster is still usable if Keycloak is down during development.