Lesson: Dashboards, Exploring & Alerts
What you'll learn
- What a Grafana data source is and why Grafana stores no data itself.
- How to build a dashboard out of panels that query a data source.
- How to use Explore to ask ad-hoc questions without building anything.
- How basic alerting works: rules, conditions, and notifications.
- How to log in to and navigate the lab Grafana at 10.100.100.4.
By the end you'll be able to open Grafana, explore live data, build a panel, and create a simple alert.
The lesson
1. What Grafana is (and isn't)
Grafana is a visualization and alerting tool. It is the "front end" of observability: dashboards, graphs, alerts, and an exploration UI. Crucially, Grafana stores no telemetry of its own. It connects to data sources and runs queries against them, then draws the results.
GRAFANA (10.100.100.4)
+---------------------------------------+
| Dashboards | Explore | Alerting |
+-------------------+-------------------+
| queries
+--------------+--------------+-------------+
v v v v
InfluxDB Loki Mimir Tempo
(metrics) (logs, .5) (metrics) (traces)
The lab Grafana lives at http://10.100.100.4 (reached through the Jumpbox bastion; TLS is terminated at the pfSense HAProxy edge). Log in with the lab Grafana admin credentials (use <REDACTED> — never paste real passwords into docs or chats).
2. Data sources
A data source is a connection to a backend that holds telemetry. You configure it once under Connections → Data sources, give it a name, a URL, and any auth, then every panel can use it.
The lab has these data sources wired up:
- InfluxDB — time-series metrics (CPU, memory, request rates).
- Loki at
http://10.100.100.5:3100— logs from every host (365-day retention). - Optionally Mimir (metrics) and Tempo (traces) in the wider stack.
Each data source has its own query language. InfluxDB uses InfluxQL/Flux, Loki uses LogQL, and Mimir/Prometheus-style sources use PromQL. Grafana adapts its query editor to whichever source the panel points at.
3. Panels and dashboards
A panel is a single visualization — a time-series graph, a stat number, a gauge, a table, or a logs view. A panel has:
- A data source (where to query).
- A query (what to fetch).
- A visualization type and options (how to draw it).
A dashboard is a collection of panels arranged on a grid, sharing a time range (top-right, e.g. "last 6 hours") and often variables (dropdowns like $host that let one dashboard serve many targets).
To build your first panel:
- Click + → New dashboard → Add visualization.
- Pick a data source (say Loki).
- Write a query. For Loki, a LogQL query to count error log lines per service:
Heresum by (service) (count_over_time({job="myapp"} |= "error" [5m])){job="myapp"}selects log streams by label,|= "error"keeps only lines containing "error", andcount_over_time(...[5m])counts matches per 5-minute window. - Choose Time series as the visualization, give the panel a title, and Save dashboard.
A metric panel using a PromQL-style query (against Mimir/Prometheus) looks like:
sum(rate(http_requests_total{job="myapp",status=~"5.."}[5m]))
This is the per-second rate of HTTP 5xx errors over 5-minute windows — a classic "is my app erroring?" panel.
4. Explore — ad-hoc questions
Explore (the compass icon) is for investigating right now without building a dashboard. You pick a data source, type a query, and iterate. It's where the observability workflow happens during an incident.
A typical Explore session:
- Open Explore, choose Loki.
- Start broad:
{job="myapp"}to see all the app's logs. - Narrow down:
{job="myapp"} |= "timeout"to find timeouts. - Parse and filter further:
{job="myapp"} | logfmt | level="error"(thelogfmtparser splitskey=valuelog lines into labels you can filter on).
Explore also lets you split the view to compare two queries side by side, and to jump from a log line to a trace (if trace IDs are present) — that's the metric → trace → log workflow in action.
5. Basic alerting
Grafana Alerting turns a query into an automatic notification. An alert rule has these parts:
QUERY --> CONDITION --> [pending for N min] --> FIRING --> NOTIFICATION
(data) (threshold) (avoid flapping) (state) (contact point)
The pieces:
- Query: e.g. the 5xx error rate PromQL above, or a Loki count.
- Condition / threshold: e.g.
IS ABOVE 0.05(5%). - Evaluation interval & "for" duration: how often to check, and how long the condition must hold before firing. The "for" duration stops brief spikes from paging you ("flapping").
- Labels & annotations: metadata and a human message ("Checkout 5xx error rate is {{ $value }}").
- Contact point: where the alert goes — email, Slack, a webhook. (The lab sends email via the SMTP configured in
grafana.ini.) - Notification policy: routes alerts to the right contact point based on labels.
To create one: Alerting → Alert rules → New alert rule, define the query, set the condition, choose the evaluation interval, set a "for" of e.g. 5m, attach a contact point via the notification policy, and save. An alert moves through states: Normal → Pending → Firing → (Resolved).
6. Good dashboard and alert habits
- One dashboard, one story. A service dashboard should answer "is this service healthy?" at a glance — the classic RED metrics: Rate, Errors, Duration.
- Alert on symptoms, not causes. Alert on "users are seeing errors," not "CPU is at 80%." High CPU might be fine; user-facing errors never are.
- Every alert needs an action. If no human needs to do anything, it shouldn't page — make it a dashboard, not an alert. This prevents alert fatigue.
- Use variables so one dashboard covers all hosts/pods instead of copy-pasting panels.
7. Try it in the lab
Open Grafana at 10.100.100.4, go to Explore, pick the Loki data source, and run {job=~".+"} to see what's flowing in from across the lab. Then pick a host label and narrow down. That five-minute exercise is the heart of everything in this module.
Dig deeper
- Grafana — Dashboards documentation
- Grafana — Explore
- Grafana — Alerting documentation
- Grafana — Data sources
- Grafana Loki — LogQL query language
Search terms
grafana create dashboard panel tutorialgrafana explore logs lokigrafana alerting rule contact point notification policyRED method rate errors duration dashboardgrafana data source configurationlogql query examples grafana
Check yourself
- Why is it accurate to say "Grafana stores no telemetry of its own"?
- What three things does every panel need?
- What is Explore best used for, compared to a dashboard?
- In an alert rule, what is the purpose of the "for" duration, and what problem does it prevent?
- What does "alert on symptoms, not causes" mean, and why does it reduce alert fatigue?
No comments to display
No comments to display