# Monitoring with InfluxDB & Grafana

The metrics pipeline: host telemetry into a time-series database, visualised and alerted on through dashboards.

# Why metrics

Logs tell you *what happened*; metrics tell you *how things are trending*. Is the host running out of memory? Is disk filling up? Is CPU pegged? Those are numbers over time, and you want them charted and alertable, not discovered by accident.

The lab runs a metrics stack on the `Monitoring` VM (`10.100.100.4`): **InfluxDB** as the time-series database and **Grafana** as the dashboards on top. The first thing it watches is the foundation everything else stands on — the Proxmox host itself.

> **Why we use this:** you can't manage what you can't see. A host quietly creeping toward full RAM or disk is the kind of thing that's obvious on a graph weeks in advance and catastrophic when discovered at failure time. Metrics turn slow-moving problems into things you notice early.

# The pipeline

Metrics flow in one direction, from source to dashboard:

```
Proxmox host  --(built-in metric server)-->  InfluxDB  -->  Grafana
 (CPU, RAM,                                  (10.100.100.4)   dashboards
  disk, I/O)                                  bucket: proxmox  & alerts
                                                              -> monitoring.example.com
```

Proxmox has a built-in metric exporter — point it at an InfluxDB endpoint and it streams host (and VM) telemetry continuously. InfluxDB stores it as time series in a bucket. Grafana reads from InfluxDB and turns it into dashboards, published at `https://monitoring.example.com`.

This is the canonical shape of a metrics system: a **source** emits numbers, a **time-series database** stores them efficiently, and a **visualisation layer** makes them human. Swap InfluxDB for Prometheus and the shape is identical — which is the point of learning it this way.

> **Why we use this:** the source → TSDB → dashboard pattern is universal. Once it clicks here, every other metrics stack (Prometheus + Grafana, cloud monitoring, etc.) is just a variation on the same three boxes.

## Diagram

![Metrics pipeline: source -> time-series DB -> dashboards](https://docs.devopsawi.com/uploads/images/gallery/2026-05/d-116.png)

# Two pillars, one pane of glass

Grafana is doing double duty in this lab, and that's deliberate. It's the dashboard front-end for **metrics** (from InfluxDB) *and* — because Grafana can query multiple data sources — the front-end for **logs** (from Loki; see the next book).

```
Grafana (10.100.100.4)
   |-- data source: InfluxDB  -> metrics (how much, how fast, trending)
   |-- data source: Loki      -> logs    (what happened, exact lines)
```

So when something looks off, you can pivot in one place: spot the CPU spike on a metric graph, then jump to the logs from that same window to see *why*. Metrics and logs are the first two "pillars of observability" (the third being traces), and having them behind one login is what makes investigating an incident feel like one workflow instead of three.

> **Why we use this:** correlation is where observability pays off. A spike on a graph and a stack trace in the logs are far more useful together than apart. Pointing one Grafana at both your TSDB and your log store is a cheap way to get that correlation.

# Lessons on monitoring

- **Watch the foundation first.** The Proxmox host's CPU/RAM/disk is the most important thing to graph — everything else depends on it staying healthy.
- **Learn the universal shape:** source → time-series DB → dashboards. InfluxDB or Prometheus, the pattern is the same.
- **One Grafana, many data sources.** Putting metrics *and* logs behind one pane turns incident investigation into a single workflow.
- **Set it up before you need it.** Monitoring you add *after* an outage is monitoring you didn't have during the outage.