Lesson: Dashboards, Exploring & Alerts

What you'll learn

What a Grafana data source is and why Grafana stores no data itself.
How to build a dashboard out of panels that query a data source.
How to use Explore to ask ad-hoc questions without building anything.
How basic alerting works: rules, conditions, and notifications.
How to log in to and navigate the lab Grafana at 10.100.100.4.

By the end you'll be able to open Grafana, explore live data, build a panel, and create a simple alert.

The lesson

1. What Grafana is (and isn't)

Grafana is a visualization and alerting tool. It is the "front end" of observability: dashboards, graphs, alerts, and an exploration UI. Crucially, Grafana stores no telemetry of its own. It connects to data sources and runs queries against them, then draws the results.

          GRAFANA  (10.100.100.4)
   +---------------------------------------+
   |  Dashboards | Explore | Alerting       |
   +-------------------+-------------------+
                       | queries
        +--------------+--------------+-------------+
        v              v              v             v
   InfluxDB         Loki           Mimir         Tempo
   (metrics)     (logs, .5)      (metrics)      (traces)

The lab Grafana lives at http://10.100.100.4 (reached through the Jumpbox bastion; TLS is terminated at the pfSense HAProxy edge). Log in with the lab Grafana admin credentials (use <REDACTED> — never paste real passwords into docs or chats).

2. Data sources

A data source is a connection to a backend that holds telemetry. You configure it once under Connections → Data sources, give it a name, a URL, and any auth, then every panel can use it.

The lab has these data sources wired up:

InfluxDB — time-series metrics (CPU, memory, request rates).
Loki at http://10.100.100.5:3100 — logs from every host (365-day retention).
Optionally Mimir (metrics) and Tempo (traces) in the wider stack.

Each data source has its own query language. InfluxDB uses InfluxQL/Flux, Loki uses LogQL, and Mimir/Prometheus-style sources use PromQL. Grafana adapts its query editor to whichever source the panel points at.

3. Panels and dashboards

A panel is a single visualization — a time-series graph, a stat number, a gauge, a table, or a logs view. A panel has:

A data source (where to query).
A query (what to fetch).
A visualization type and options (how to draw it).

A dashboard is a collection of panels arranged on a grid, sharing a time range (top-right, e.g. "last 6 hours") and often variables (dropdowns like $host that let one dashboard serve many targets).

To build your first panel:

Click + → New dashboard → Add visualization.
Pick a data source (say Loki).
Write a query. For Loki, a LogQL query to count error log lines per service:
```
sum by (service) (count_over_time({job="myapp"} |= "error" [5m]))
```
Here {job="myapp"} selects log streams by label, |= "error" keeps only lines containing "error", and count_over_time(...[5m]) counts matches per 5-minute window.
Choose Time series as the visualization, give the panel a title, and Save dashboard.

A metric panel using a PromQL-style query (against Mimir/Prometheus) looks like:

sum(rate(http_requests_total{job="myapp",status=~"5.."}[5m]))

This is the per-second rate of HTTP 5xx errors over 5-minute windows — a classic "is my app erroring?" panel.

4. Explore — ad-hoc questions

Explore (the compass icon) is for investigating right now without building a dashboard. You pick a data source, type a query, and iterate. It's where the observability workflow happens during an incident.

A typical Explore session:

Open Explore, choose Loki.
Start broad: {job="myapp"} to see all the app's logs.
Narrow down: {job="myapp"} |= "timeout" to find timeouts.
Parse and filter further: {job="myapp"} | logfmt | level="error" (the logfmt parser splits key=value log lines into labels you can filter on).

Explore also lets you split the view to compare two queries side by side, and to jump from a log line to a trace (if trace IDs are present) — that's the metric → trace → log workflow in action.

5. Basic alerting

Grafana Alerting turns a query into an automatic notification. An alert rule has these parts:

  QUERY  -->  CONDITION  -->  [pending for N min]  -->  FIRING  -->  NOTIFICATION
  (data)      (threshold)      (avoid flapping)         (state)      (contact point)

The pieces:

Query: e.g. the 5xx error rate PromQL above, or a Loki count.
Condition / threshold: e.g. IS ABOVE 0.05 (5%).
Evaluation interval & "for" duration: how often to check, and how long the condition must hold before firing. The "for" duration stops brief spikes from paging you ("flapping").
Labels & annotations: metadata and a human message ("Checkout 5xx error rate is {{ $value }}").
Contact point: where the alert goes — email, Slack, a webhook. (The lab sends email via the SMTP configured in grafana.ini.)
Notification policy: routes alerts to the right contact point based on labels.

To create one: Alerting → Alert rules → New alert rule, define the query, set the condition, choose the evaluation interval, set a "for" of e.g. 5m, attach a contact point via the notification policy, and save. An alert moves through states: Normal → Pending → Firing → (Resolved).

6. Good dashboard and alert habits

One dashboard, one story. A service dashboard should answer "is this service healthy?" at a glance — the classic RED metrics: Rate, Errors, Duration.
Alert on symptoms, not causes. Alert on "users are seeing errors," not "CPU is at 80%." High CPU might be fine; user-facing errors never are.
Every alert needs an action. If no human needs to do anything, it shouldn't page — make it a dashboard, not an alert. This prevents alert fatigue.
Use variables so one dashboard covers all hosts/pods instead of copy-pasting panels.

7. Try it in the lab

Open Grafana at 10.100.100.4, go to Explore, pick the Loki data source, and run {job=~".+"} to see what's flowing in from across the lab. Then pick a host label and narrow down. That five-minute exercise is the heart of everything in this module.

Dig deeper

Search terms

grafana create dashboard panel tutorial
grafana explore logs loki
grafana alerting rule contact point notification policy
RED method rate errors duration dashboard
grafana data source configuration
logql query examples grafana

Check yourself

Why is it accurate to say "Grafana stores no telemetry of its own"?
What three things does every panel need?
What is Explore best used for, compared to a dashboard?
In an alert rule, what is the purpose of the "for" duration, and what problem does it prevent?
What does "alert on symptoms, not causes" mean, and why does it reduce alert fatigue?