Build log · MikroTik · BGP + BFD failover

Fast IPv6 failover on RouterOS

Add BFD to the existing BGP session over WireGuard — BFD down in ~700 ms, route withdrawn fast enough for Happy Eyeballs. Pick an Ubuntu/BIRD, VyOS, or CHR relay implementation.

Overview

This is an optional enhancement to the VPS path post, which is a prerequisite — the WireGuard layer, the BGP session, the /48 aggregate, and the learned ::/0 default route all come from there. Adding BFD to that same BGP session collapses dead-tunnel detection to about 700 ms instead of waiting for BGP hold-time expiry.

This applies to the VPS path only (one of the two paths in the RB5009 CGNAT series). If you took the Route64 /56 path instead, skip this post: that path has no self-operated BGP relay and ships its own netwatch-driven fail-to-IPv4 (Route64 post §7).

The problem it solves: a WireGuard interface stays administratively UP even when the path is dead — NAT mapping expired, peer rebooted, VPS null-routed — so interface state alone is not a failure signal. Plain BGP will withdraw the default route when the session dies, but only after its hold timer expires. BFD on the same session collapses that to sub-second: the route withdraws the instant BFD declares the path dead, pings fail once, and clients are already on IPv4 by the next attempt.

EventMeasured
BFD packets blocked → BFD down~700 ms
BFD packets blocked → default route withdrawn~1.5 s
BFD packets restored → route reinstalled~10 s
Full WG service stop/start → route reinstalled~17 s
BFD bandwidth (200 ms × 3, bidirectional)~3.4 GB / mo
BFD cost at $2.50/TB~$0.0085 / mo

Design decisions

BFD inserts between BGP (which it monitors) and RA/SLAAC (which consumes the routes BGP exchanges), filling in the failure-detection layer the base build's stack is missing:

text

text

1WireGuard = encrypted transport and peer authorization 2BGP = route exchange 3BFD = fast failure detection ← added by this post 4RA/SLAAC = client addressing

The three layers around it — including the Table = off + AllowedIPs cryptokey-routing rationale this depends on — are covered in the VPS post's §3 Return routing. This post adds only the BFD line and the two edits each side needs to turn it on.

1. Conventions and placeholders

All placeholders are defined in the VPS post's §2<LAN_PREFIX>, <VPS_AS> / <HOME_AS>, <VPS_ROUTER_ID> / <HOME_ROUTER_ID>, and the wg-host interface created in its §5. Substitute the same values here before pasting.

2. VPS — add BFD to bird2 and nftables

Three surgical edits on the VPS: append a protocol bfd block to the existing /etc/bird/bird.conf, flip bfd on; inside its protocol bgp home block, and add one wg0-only allow line to /etc/nftables.conf. Re-pasting will duplicate all three — re-run only after reverting.

bird2 + nftables: BFD-only diffs

bash

1cp /etc/bird/bird.conf /etc/bird/bird.conf.pre-bfd 2cp /etc/nftables.conf /etc/nftables.conf.pre-bfd 3 4# 1. Append a BFD protocol to bird. 5cat >>/etc/bird/bird.conf <<'EOF' 6 7protocol bfd { 8 interface "wg0" { 9 min rx interval 200 ms; 10 min tx interval 200 ms; 11 idle tx interval 1 s; 12 multiplier 3; 13 }; 14 # Explicit neighbor so bird actively probes; passive-only stalls after a 15 # flap because both sides wait for the other. 16 neighbor <LAN_PREFIX>:0::2 dev "wg0"; 17} 18EOF 19 20# 2. Turn BFD on inside the existing BGP session. 21sed -i '/^protocol bgp home {/a\ bfd on;' /etc/bird/bird.conf 22 23# 3. Add the wg0-only BFD/3784 allow next to the existing BGP/179 line. 24sed -i '/iifname "wg0" tcp dport 179 accept/a\ iifname "wg0" udp dport 3784 accept # BFD from home, wg0 only' /etc/nftables.conf 25systemctl reload nftables 26 27# Restart bird so it loads the new BFD protocol and re-establishes BGP with 28# BFD enabled. Restart-on-failure for resilience after a tunnel flap. 29mkdir -p /etc/systemd/system/bird.service.d 30printf '[Service]\nRestart=on-failure\nRestartSec=2s\n' \ 31 > /etc/systemd/system/bird.service.d/restart.conf 32systemctl daemon-reload && systemctl restart bird

The explicit neighbor in protocol bfd matters. Without it, bird is passive and only responds to probes; after a flap the home router waits for BFD before re-establishing BGP, bird waits for BGP before initiating BFD, and recovery needs a manual birdc restart.

3. Home router — enable BFD on the existing BGP session

RouterOS BGP + BFD

bash

1/routing/bgp/template/set [find name=tpl-host] use-bfd=yes 2/routing/bgp/connection/set [find name=host-vps] use-bfd=yes 3 4/routing/bfd/configuration/add interfaces=wg-host \ 5 min-rx=200ms min-tx=200ms multiplier=3 6 7# Skip the next add if a UDP/3784 accept rule on wg-host already exists 8# from a previous BFD setup — RouterOS does not deduplicate filter rules. 9/ipv6/firewall/filter add chain=input action=accept protocol=udp dst-port=3784 \ 10 in-interface=wg-host comment="BFD from VPS" \ 11 place-before=[find where chain=input and comment="defconf: drop everything else not coming from LAN"]

BFD's UDP/3784 control packets arrive unsolicited, so the defconf established,related,untracked rule that lets the home-router-initiated BGP session work does not cover them. The explicit allow above is the one new input rule this companion adds on the home router. The matching nftables line on the VPS side is in §2.

BFD is added on top of the BGP session the base build already created. Only the template and connection get use-bfd=yes, the BFD configuration entry defines the timing, and the firewall rule allows the control packets. On the live home router, setting only the template did not alter the already-created connection; the connection needed its own use-bfd=yes.

4. Verification

Confirm BGP/BFD and the failover

bash

1# On the VPS: 2birdc show protocols # bgp + bfd both Established/Up 3birdc show route <LAN_PREFIX>::/48 # learned from home 4ip -6 route show <LAN_PREFIX>::/48 # proto bird, metric 32, via wg0 5wg show wg0 allowed-ips # /48 is allowed, not route-owned 6 7# On the home router: 8/routing/bgp/session/print # established 9/routing/bfd/session/print # state=up 10/ipv6/route/print where dst-address=::/0 # bgp, distance 20 11/ipv6/route/print where dst-address=<LAN_PREFIX>::/48 12 13# Failover: stop WireGuard on the VPS and time it. 14# wg-quick down wg0 15# A client's IPv6 should drop quickly enough for Happy-Eyeballs to move on 16# to IPv4; on the measured live path, the route disappeared in ~1.5 s and 17# reappeared about ~10 s after BFD packets were restored.

A brief IPv6 miss and quick fallback to IPv4 — instead of waiting for BGP hold-time expiry — is the whole point of this change.

If you'd rather have a second active upstream than a fast fall-to-IPv4 on the single one, see Multi-homing IPv6 over CGNAT on RouterOS — the series finale. It carries the same BFD layer on the VPS session, adds a parallel Route64 BGP session under one announceable /48, and lets RouterOS best-path pick the active default. Requires a 32-bit ASN and an announceable /48 as hard prerequisites.

References

Standards and tools

Share

Comments

Comments are powered by GitHub Discussions and require a free GitHub account to post.