API Reference

WAN Health Monitoring Guide

Overview

The WAN health monitoring daemon continuously evaluates every default route on the PDU—including gateways created for VLAN subinterfaces—and prioritizes the healthiest paths for outbound traffic. When a preferred route fails, the daemon automatically promotes backup routes according to their configured priorities and metrics.

Key Features

  • Automatic enrollment: Any default route discovered on the system or created via pdu-vlan create/pdu-route add is monitored without additional configuration.
  • Periodic probing: Configurable timers uses pings using per-route health targets.
  • Failover automation: Unhealthy routes are de-prioritized, while the best available path is promoted with lower metrics for deterministic routing.
  • Admin control: CLI shortcuts allow administrators to start, stop, and inspect the daemon directly from the shell.

CLI Access

The WAN health controls live under the existing network ip command tree. Shortcuts are also available for convenience.

# Long form
network ip wan-health <command>

# Shortcut
wan-health <command>

Supported subcommands:

  • enable – Turn on monitoring
  • disable – Pause monitoring (timer stops)
  • status – Show daemon status, monitored routes, and current preferences
  • run-check – Trigger an immediate health check cycle
  • failover-enable / failover-disable – Toggle automatic failover actions
  • config – View or adjust health targets, thresholds, and intervals

Monitoring Behavior

  1. Route discovery – The daemon scans the routing table for every default route present on the system. Gateways created with pdu-vlan create are picked up automatically, and any pre-existing default routes (for example, on the base interface) are enrolled as well.
  2. Health checks – Each route is probed via pings using the configured health_target (defaults to 8.8.8.8). Success/failure counters determine when a path is considered healthy or degraded.
  3. Failover decisions – When a route crosses failure thresholds, the daemon increases its metric. Recovery thresholds demote the metrics back to their original values.
  4. State tracking – The daemon records route health changes internally so failover decisions remain consistent across restarts.

Managed Metrics and Route Protection

  • System defaults – The physical WAN interface keeps the reserved metric 0. Administrators cannot delete this path directly; use the platform network settings if you must re-home the primary uplink.
  • VLAN defaults – VLAN gateways created through pdu-vlan create (or provided by the DHCP helper script) use unique metrics within the 100–899 range. The CLI will reject duplicates so that the WAN health daemon always knows the intended priority order.
  • Failover normalization – When the daemon suppresses an unhealthy route it temporarily re-adds it with a +9000 offset (for example 200 → 9200). On the next discovery pass the value is normalized back into the original range so route history, status output, and configuration files continue to align.
  • CLI safeguardspdu-route del and its long-form counterpart prevent removal of system-managed or VLAN-managed default routes. To retire one of these paths, delete the associated VLAN or disable WAN monitoring instead of issuing manual ip route commands.
  • DHCP renewals – The DHCP helper hook reapplies stored metrics whenever a lease refreshes, keeping priorities stable across reboots and link flaps.

Quick Start Workflow

# Enable monitoring and failover
wan-health enable
wan-health failover-enable

# Trigger an immediate health cycle
wan-health run-check

# Review status and monitored routes
wan-health status

If you need to pause monitoring temporarily:

wan-health disable
# ... perform maintenance ...
wan-health enable

Reading Command Output

Running wan-health run-check prints a short health report. Each line shows the interface, gateway, configured priority, and whether probes succeeded. The counters (S for successes and F for failures) help you see how close a route is to tripping the configured thresholds.

DX> wan-health run-check
Running automatic health check (with debug output)...

[WAN-HEALTH] Starting automatic health check at Mon Sep 22 11:34:34 2025
[WAN-HEALTH] Testing eth0 via 192.168.1.1 (priority 100)... SUCCESS (S:13 F:0)
[WAN-HEALTH] Testing eth0.2 via 192.168.20.2 (priority 200)... FAILED (S:0 F:182)
[WAN-HEALTH] Testing eth0.3 via 192.168.30.2 (priority 300)... SUCCESS (S:172 F:0)
[WAN-HEALTH] Testing eth0.8 via 192.168.80.2 (priority 800)... FAILED (S:0 F:182)
[WAN-HEALTH] Health check completed

Unhealthy paths continue to accumulate failure counts until they recover.

The wan-health status command summarises the current configuration and the same counters in a table, so you can review conditions without rerunning checks:

DX> wan-health status

WAN Health Monitoring Status
============================
Monitoring Enabled:     Yes
Automatic Failover:     Yes
Check Interval:         300 seconds
Failure Threshold:      3
Recovery Threshold:     2
Health Target:          8.8.8.8

Default Route Health Status
===========================
Interface    Gateway         Priority Health     Success      Failures
------------ --------------- -------- ---------- ------------ ------------
eth0         192.168.1.1     100      Healthy    12           0
eth0.2       192.168.20.2    200      Unhealthy  0            181
eth0.3       192.168.30.2    300      Healthy    171          0
eth0.8       192.168.80.2    800      Unhealthy  0            181

Use the priority column to confirm that your most important paths have the lowest numbers, and watch the Success/Failure counters to verify that thresholds are behaving as expected.

Integrating with VLAN Configuration

When you create VLAN subinterfaces using pdu-vlan create --vid <n> --ip <addr/prefix> --gateway <gw> --priority <metric>, the daemon automatically adds the supplied gateway to its watch list. Priorities you choose (e.g., 200/300/400) directly influence failover order—lower numbers represent preferred paths.

Similarly, any existing default routes added via pdu-route add --dst default --via <gw> [--metric <n>] are monitored. Ensure backup routes have higher metrics so the daemon can re-prioritize them effectively.

Advanced Configuration

The monitoring daemon uses following configurations:

  • interval – Probe interval in seconds
  • failure_threshold – Consecutive failures before a route is marked unhealthy
  • recovery_threshold – Consecutive successes before promoting a route
  • health_target – Default probe destination (per-route overrides supported)

Use wan-health config to inspect or adjust these values; any changes are persisted and applied on the next cycle.

Additional Resources

  • PDU VLAN and Routing Management – Detailed instructions for creating VLAN interfaces and custom routes.
  • Diagnostics Toolkit – Admin-only commands traceroute and mtr provide hop-by-hop and real-time latency insights; ideal companions when investigating WAN events.

For deeper troubleshooting or to discuss best practices, reach out to Synaccess support.