BGP failover latency spikes after ASN migration
So we just migrated our secondary ASN from a smaller provider to a larger one for better redundancy, and now I'm seeing weird latency spikes during failover events. The primary path converges fine, but when traffic switches to the backup via BGP, we're getting 200-300ms of jitter for like 10-15 seconds before it stabilizes.
Both ASNs are properly configured with full tables from Vultr and Hetzner. I've tuned timers (hold time 9s, keepalive 3s) but that didn't help. Could this be a rib/fib propagation issue, or am I missing something in the announce/withdraw sequence?
Anyone else deal with this? Multi-path failover is supposed to be transparent, lol. What am I overlooking?
Edited at 26 Mar 2026, 20:20
Sounds like RIB-FIB sync delay or maybe route flapping on the secondary path. Have you checked if the backup ASN is actually receiving the full table from both peers, or just one? I've seen this where a secondary provider only had partial coverage and convergence took forever.
Also worth checking: are you seeing actual packet loss during the spike, or just latency? If it's pure latency with no drops, could be queuing on the backup link. Might be asymmetric routing too—check both directions with something like https://ping.pe/ to confirm traffic's actually balanced.
One more thing: what's your IGP doing? If you're relying on BGP timers alone without fast IGP failover, that 10-15s sounds about right for full convergence.
Good point on the RIB-FIB sync—I checked and the secondary ASN is only getting full tables from Hetzner, not Vultr. Let me add that second peer and see if it clears up the jitter. Thanks!
Before you add that Vultr peer, check if it's actually a peering issue or a traffic engineering one. I'd run a show ip bgp summary on both ASNs during failover and look at the AS_PATH lengths—if the backup path has significantly longer prepends or AS hops, that could cause the FIB to recalculate slower. Also verify your BGP multipath is actually load-sharing on the primary; sometimes the jitter is just the backup kicking in because primary wasn't truly active. Use https://bgp.tools/ to check what the world sees from your ASNs.
Have you checked if Hetzner's RR (route refresh) is working correctly after the migration? Sometimes the secondary peer doesn't immediately re-advertise on state changes. Also worth checking your local preference—if both peers are advertising the same routes with same localpref, you might be getting asymmetric traffic during convergence. Run show ip bgp neighbors <peer-ip> advertised-routes on both sides to confirm they're actually sending identical tables. The 10-15s stabilization window sounds less like timers and more like a filtering/export-policy issue on the new provider's side.