Skip to main content

Lesson: ICMP & Network Troubleshooting

What you'll learn

  • What ICMP is and why it's separate from TCP/UDP.
  • How ping and traceroute actually work.
  • A repeatable mental model for diagnosing "I can't reach X" — from the bottom of the stack up.
  • Which tool answers which question, so you stop guessing.

This chapter turns the previous three into a skill: finding where a connection breaks.


The lesson

1. ICMP — the network's diagnostic channel

ICMP (Internet Control Message Protocol) isn't for carrying your application data like TCP/UDP — it's how the network reports status and errors about IP itself. When a router can't deliver a packet, or a host wants to check if another host is alive, ICMP is the messenger.

Two ICMP message types you'll use constantly:

  • Echo Request / Echo Reply → this is what ping uses ("are you there?" / "yes").
  • TTL Exceeded → sent by a router when a packet's "time to live" hits zero — the trick that makes traceroute work.

Note: many firewalls block ICMP. So "ping fails" does not always mean "host is down" — it may just mean ICMP is filtered while TCP services work fine. Keep that in mind; it's a classic false alarm.

2. ping — is the host reachable at the IP layer?

ping 10.100.100.7

ping sends ICMP Echo Requests and times the replies. It answers one narrow question: can IP packets get to that address and back? It tells you about reachability and latency — not whether any particular service (port) is working.

64 bytes from 10.100.100.7: icmp_seq=1 ttl=64 time=0.42 ms   ← reachable, fast
Request timeout for icmp_seq=1                                ← no reply (down? firewalled? wrong subnet?)

3. traceroutewhere along the path does it break?

traceroute 10.100.100.7      # (or tracepath, or mtr for a live view)

traceroute reveals every router (hop) between you and the destination. The clever trick: it sends packets with TTL=1, then TTL=2, then TTL=3… Each router decrements TTL; whichever router hits TTL=0 sends back an ICMP "TTL Exceeded" — revealing itself. So hop by hop, the whole path is mapped.

1  10.100.100.1   0.3 ms     ← your gateway
2  * * *                     ← this hop didn't answer (often just ICMP-filtered, not necessarily broken)
3  203.0.113.1    8.1 ms     ← out toward the internet

Use it when ping to a remote destination fails and you want to know whether the problem is near you (your gateway) or far away.

4. A repeatable troubleshooting model: bottom-up

When something "can't connect," don't guess randomly. Walk the stack from the bottom up — each layer depends on the one below:

5. Application?  → Is the service itself healthy? (logs, ss -tlnp on the server)
4. Transport?    → Is the PORT open/listening? (nc -vz host port)   ← refused vs timeout!
3. Naming?       → Does the NAME resolve to the right IP? (getent hosts / dig)
2. Routing?      → Can IP packets reach the host? (ping; traceroute if remote)
1. Link/Local?   → Do I even have an IP + gateway? (ip addr / ip route)
   ▲
   start here, move up only once each layer checks out

Worked example — "I can't reach the docs site at docs.example.com":

  1. ip addr / ip route → I have 10.100.100.x/24 and a default gateway. ✔
  2. getent hosts docs.example.com → resolves to the expected IP. ✔ (if not, it's DNS)
  3. ping <that IP> → replies. ✔ (if not, routing/host — try traceroute)
  4. nc -vz <that IP> 443refused. ✘ → the host is up but nothing is listening on 443 → the web service is down or on a different port. Now go read the service's logs.

Each step rules out a whole class of causes. That's the difference between diagnosing and guessing.

5. The toolbox, mapped to questions

Question                                  Tool
─────────────────────────────────────────────────────────
Do I have an address & gateway?           ip addr / ip route
Does this name resolve, and to what?      getent hosts / dig / nslookup
Is the host reachable at all?             ping
Where does the path break (remote)?       traceroute / mtr
Is this specific port open?               nc -vz / ss
What's listening on this machine?         ss -tlnp
Can I do a full HTTP request?             curl -v

Dig deeper

  • Cloudflare Learning — What is ICMP?: https://www.cloudflare.com/learning/ddos/glossary/internet-control-message-protocol-icmp/
  • Julia Evans — How to be a wizard programmer / networking zines (wonderfully approachable): https://jvns.ca/
  • DigitalOcean — How To Use Traceroute and MTR to Diagnose Network Issues: https://www.digitalocean.com/community/tutorials/how-to-use-traceroute-and-mtr-to-diagnose-network-issues
  • mtr (combines ping + traceroute into a live view): https://www.redhat.com/sysadmin/linux-mtr-command

Search terms

  • what is ICMP protocol explained
  • how does traceroute work TTL
  • why does ping fail but website works (the ICMP-filtered gotcha)
  • network troubleshooting methodology bottom up OSI
  • nc netcat test if port is open

Check yourself

  1. What is ICMP for, and how is it different from TCP/UDP?
  2. What single question does ping answer — and what does it not tell you?
  3. Explain the TTL trick that lets traceroute discover each hop.
  4. ping to a host fails. Give two innocent reasons that are not "the host is down."
  5. Put these in the order you'd check them: port open? · name resolves? · I have a gateway? · host reachable? · service healthy?