summaryrefslogtreecommitdiffstats
path: root/zebra/zebra_evpn_mh.h (follow)
Commit message (Collapse)AuthorAgeFilesLines
* zebra: support for lacp bypass with EVPN MHAnuradha Karuppiah2021-02-241-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Feature overview: ================= A 802.3ad bond can be setup to allow lacp-bypass. This is done to enable servers to pxe boot without a LACP license i.e. allows the bond to go oper up (with a single link) without LACP converging. If an ES-bond is oper-up in an "LACP-bypass" state MH treats it as a non-ES bond. This involves the following special handling - 1. If the bond is in a bypass-state the associated ES is placed in a bypass state. 2. If an ES is in a bypass state - a. DF election is disabled (i.e. assumed DF) b. SPH filter is not installed. 3. MACs learnt via the host bond are advertised with a zero ESI. When the ES moves out of "bypass" the MACs are moved from a zero-ESI to the correct non-zero id. This is treated as a local station move. Implementation: =============== When (a) an ES is detached from a hostbond or (b) an ES-bond goes into LACP bypass zebra deletes all the local macs (with that ES as destination) in the kernel and its local db. BGP re-sends any imported MAC-IP routes that may exist with this ES destination as remote routes i.e. zebra can end up programming a MAC that was perviously local as remote pointing to a VTEP-ECMP group. When an ES is attached to a hostbond or an ES-bond goes LACP-up (out of bypss) zebra again deletes all the local macs in the kernel and its local db. At this point BGP resends any imported MAC-IP routes that may exist with this ES destination as sync routes i.e. zebra can end up programming a MAC that was perviously remote as local pointing to an access port. Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
* zebra: fix problem with SVI MAC not being sent to BGPAnuradha Karuppiah2021-02-191-0/+6
| | | | | | | | | | For MH the SVI MAC is advertised to prevent flooding of ARP replies. But because of a bug the SVI MAC was being added to the zebra database but not sent to bgpd for advertising. Ticket: CM-33329 Signed-off-by: Anuradha Karuppiah <anuradhak@nvidia.com>
* zebra: changes to advertise SVI mac by default if evpn-mh is enabledAnuradha Karuppiah2021-02-191-0/+13
| | | | | | | | | | | | | | | Added support for advertising SVI MAC if EVPN-MH is enabled. In the case of EVPN MH arp replies from an attached server can be sent to the ES-peer. To prevent flooding of the reply the SVI MAC needs to be advertised by default. Note: advertise-svi-ip could have been used as an alternate way to advertise SVI MAC. However that config cannot be turned on if SVI IPs are re-used (which is done to avoid wasting IP addresses in a subnet). Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
* zebra: advertise stale neighs if EVPN-MH is not enabledAnuradha Karuppiah2020-12-211-0/+11
| | | | | | | | | | | | | | | | | | | | | | With EVPN-MH, Type-2 routes are also used for MAC-IP syncing between ES peers so a change was done to only treat REACHABLE local neigh entries as local-active and advertise them as Type-2 routes i.e. STALE neigh entries are no longer advertised as Type-2s. This however exposed some unexpected problems with MLAG where a secondary reboot followed by a primary reboot left a lot of neighs in STALE state (on the primary) resulting in them not being advertised. And remote routed traffic to those hosts being blackholed in a sym-IRB setup. This commit is a workaround to fix the regression (it doesn't fix the underlying problems with entries not becoming REACHABLE; which maybe a day-1 problem). The workaround is to continue advertising STALE neighbors if EVPN-MH is not enabled. Ticket: CM-30303 Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
* zebra: add support for DF delay timerAnuradha Karuppiah2020-12-151-0/+7
| | | | | | | | | | | | | | | | | | When a new ES is created it is held in a non-DF state for 3 seconds as specified by RFC7432. This allows the switch time to import the Type-4 routes from the peers. And the peers time to rx the new Type-4 route. root@torm-11:mgmt:~# vtysh -c "show evpn es 03:44:38:39:ff:ff:01:00:00:01"|grep DF DF status: non-df DF delay: 00:00:01 DF preference: 50000 root@torm-11:mgmt:~# vtysh -c "show evpn es 03:44:38:39:ff:ff:01:00:00:01"|grep DF DF status: df DF preference: 50000 root@torm-11:mgmt:~# Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
* zebra: change the L2 NHG id format to co-exist with the L3NHG idsAnuradha Karuppiah2020-12-011-4/+4
| | | | | | | | | It is now 4bits of type and 28bits of value - 1. type=0 is for L3 NHG 2. type=1 is for L2 NH 3. type=2 is for L2 NHG Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
* zebra: allocate one nexthop id per-VTEP instead of one per-ES-VTEPAnuradha Karuppiah2020-12-011-3/+30
| | | | | | | | | | | | | | | This is an optimization to reduce the number of L2 nexthops. A l2 or fdb nexthop simply provides the dataplane with a nexthop ip- torm-12:mgmt:~# ip nexthop id 268435461 via 27.0.0.20 scope link fdb id 268435463 via 27.0.0.20 scope link fdb id 268435465 via 27.0.0.20 scope link fdb So there is no need to allocate a nexthop per-ES/per-VTEP. There can be 100+ ESs per-VTEP so this change cuts the scale down by a factor of 100. Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
* zebra: support for slow-failover of local MACs on an ESAnuradha Karuppiah2020-12-011-1/+17
| | | | | | | | | | | | | | | | | | | | | When a local ES flaps there are two modes in which the local MACs are failed over - 1. Fast failover - A backup NHG (ES-peer group) is programmed in the dataplane per-access port. When a local ES flaps the MAC entries are left unaltered i.e. pointing to the down access port. And the dataplane redirects traffic destined to the oper-down access port via the backup NHG. 2. Slow failover - This mode needs to be turned on to allow dataplanes not capable of re-directing traffic. In this mode local MAC entries on a down local ES are re-programmed to point to the ES-peers' NHG. And vice-versa i.e. when the ES comes up the MAC entries are re-programmed with the access port as dest. Fast failover is on by default. Slow failover can be enabled via the following config - evpn mh redirect-off Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
* zebra: Keep DAD disabled if EVPN MH is turned onAnuradha Karuppiah2020-11-241-0/+12
| | | | | | | | | | DAD is not supported currently with EVPN-MH so we turn it off internally when the first ES config is detected. PS: Note that when all local ESs are deleted DAD will stay off and will need to be cleared via a daemon restart. Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
* zebra: uplink tracking and startup delay for EVPN-MHAnuradha Karuppiah2020-10-271-4/+30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Local ethernet segments are held in a protodown or error-disabled state if access to the VxLAN overlay is not ready - 1. When FRR comes up the local-ESs/access-port are kept protodown for the startup-delay duration. During this time the underlay and EVPN routes via it are expected to converge. 2. When all the uplinks/core-links attached to the underlay go down the access-ports are similarly protodowned. The ES-bond protodown state is propagated to each ES-bond member and programmed in the dataplane/kernel (per-bond-member). Configuring uplinks - vtysh -c "conf t" vtysh -c "interface swp4" vtysh -c "evpn mh uplink" Configuring startup delay - vtysh -c "conf t" vtysh -c "evpn mh startup-delay 100" >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> EVPN protodown display - ======================== root@torm-11:mgmt:~# vtysh -c "show evpn" L2 VNIs: 10 L3 VNIs: 3 Advertise gateway mac-ip: No Advertise svi mac-ip: No Duplicate address detection: Disable Detection max-moves 5, time 180 EVPN MH: mac-holdtime: 60s, neigh-holdtime: 60s startup-delay: 180s, start-delay-timer: 00:01:14 <<<<<<<<<<<< uplink-cfg-cnt: 4, uplink-active-cnt: 4 protodown: startup-delay <<<<<<<<<<<<<<<<<<<<<<< >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ES-bond protodown display - =========================== root@torm-11:mgmt:~# vtysh -c "show interface hostbond1" Interface hostbond1 is up, line protocol is down Link ups: 0 last: (never) Link downs: 1 last: 2020/04/26 20:38:03.53 PTM status: disabled vrf: default OS Description: Local Node/s torm-11 and Ports swp5 <==> Remote Node/s hostd-11 and Ports swp1 index 58 metric 0 mtu 9152 speed 4294967295 flags: <UP,BROADCAST,MULTICAST> Type: Ethernet HWaddr: 00:02:00:00:00:35 Interface Type bond Master interface: bridge EVPN-MH: ES id 1 ES sysmac 00:00:00:00:01:11 protodown: off rc: startup-delay <<<<<<<<<<<<<<<<< >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ES-bond member protodown display - ================================== root@torm-11:mgmt:~# vtysh -c "show interface swp5" Interface swp5 is up, line protocol is down Link ups: 0 last: (never) Link downs: 3 last: 2020/04/26 20:38:03.52 PTM status: disabled vrf: default index 7 metric 0 mtu 9152 speed 10000 flags: <UP,BROADCAST,MULTICAST> Type: Ethernet HWaddr: 00:02:00:00:00:35 Interface Type Other Master interface: hostbond1 protodown: on rc: startup-delay <<<<<<<<<<<<<<<< root@torm-11:mgmt:~# >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
* zebra: handle local-es bridge port associationAnuradha Karuppiah2020-10-261-0/+1
| | | | | | | | A local ES can be added or removed to a bridge after it is created. When it becomes a bridge port member the dataplane attributes need to be programmed. Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
* zebra: changes for programming SPH, non-DF and backup NHG br-port attrsAnuradha Karuppiah2020-10-261-0/+5
| | | | | | | split horizon filter, non-DF block filter and backup nexthop group are passed as bridge port attributes to the dataplane. Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
* zebra: changes to run DF electionAnuradha Karuppiah2020-10-261-0/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 1. DF preference is configurable per-ES ! interface hostbond1 evpn mh es-df-pref 100 >>>>>>>>>>> evpn mh es-id 1 evpn mh es-sys-mac 00:00:00:00:01:11 ! 2. This parameter is sent to BGP and advertised via the ESR. 3. The peer-ESs' DF params are sent to zebra (by BGP) and used for running the DF election. 4. If the local VTEP becomes non-DF on an ES a block filter is programmed in the dataplane to drop de-capsulated BUM packets destined to that ES. Sample output ============= >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> torm-11# sh evpn es Type: L local, R remote, N non-DF ESI Type ES-IF VTEPs 03:00:00:00:00:01:11:00:00:01 LRN hostbond1 27.0.0.16 03:00:00:00:00:01:22:00:00:02 LR hostbond2 27.0.0.16 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> torm-11# sh evpn es 03:00:00:00:00:01:11:00:00:01 ESI: 03:00:00:00:00:01:11:00:00:01 Type: Local,Remote Interface: hostbond1 State: up Ready for BGP: yes VNI Count: 10 MAC Count: 2 DF: status: non-df preference: 100 >>>>>>>> Nexthop group: 0x2000001 VTEPs: 27.0.0.16 df_alg: preference df_pref: 32767 nh: 0x100000d >>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
* zebra: re-name some mh functions to make the code more readableAnuradha Karuppiah2020-09-161-2/+2
| | | | | | | | As a part of the re-factoring some of the evpn_vni_es apis got re-named as evpn_evpn_es. Changed them to evpn_es_evi to make it common to vxlan and mpls. Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
* zebra: rename vni to evpn where appropriatePat Ruddy2020-08-121-14/+14
| | | | | | | | The main zebra_vni_t hash structure has been renamed to zebra_evpn_t to allow for other transport underlays. Rename functions and variables to reflect this change. Signed-off-by: Pat Ruddy <pat@voltanet.io>
* zebra: support for MAC-IP sync routesAnuradha Karuppiah2020-08-051-3/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | MAC-IP routes are used for syncing local entries across redundant switches in an EVPN-MH setup. A path from a peer that has a local ES as destination is tagged as a SYNC path. The SYNC path results in the addition of local MAC and/or local neigh entry in zebra and in the dataplane. Implementation overview ======================= 1. Three new flags "local-inactive", "peer-active" and "peer-proxy" are maintained per-local-MAC and per-local-Neigh entry. 2. The "peer-XXX" flags are set and cleared via SYNC path updates from BGP. Proxy sync paths result in the setting of "peer-proxy" flag (and non-proxies result in the "peer-active"). 3. A neigh entry that has a "peer-XXX" flag set is programmed as "static" in the dataplane. 4. A MAC entry that has a "peer-XXX" flag set or is referenced by a sync-neigh entry (that has a "peer-XXX" flags set) is programmed as "static" in the dataplane. 5. The sync-seq number is used to normalize the MM seq number across all the redundant switches i.e. the max MM seq number across all switches is used by each of the switches. This commit also includes the changes needed for extended MM seq syncing. 6. A MAC/neigh entry has to be local-active or peer-active to sent to BGP. An entry that is NOT local-active is sent with the proxy flag (so BGP can "proxy" advertise it). 7. The "peer-active" flag is aged out by zebra by using a hold_timer (this is instead of being abruptly dropped on SYNC path delete). This age-out is needed to handle peer-switch restart (procedures are specified in draft-rbickhart-evpn-ip-mac-proxy-adv). The holdtime needs to be sufficiently long to allow an external neighmgr daemon or the dataplane component to independently probe and establish local reachability of a host. The MAC and neigh hold time values are configurable. PS: In the future this probing may happen in FRR itself. CLI changes to display sync info ================================ MAC === >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> root@torm-11:mgmt:~# net show evpn mac vni 1000 Number of MACs (local and remote) known for this VNI: 6 Flags: N=sync-neighs, I=local-inactive, P=peer-active, X=peer-proxy MAC Type Flags Intf/Remote ES/VTEP VLAN Seq #'s 00:02:00:00:00:25 local vlan1000 1000 0/0 02:02:00:00:00:02 local PI hostbond1 1000 0/0 02:02:00:00:00:06 remote 03:00:00:00:00:02:11:00:00:01 0/0 02:02:00:00:00:01 local X hostbond1 1000 0/0 00:00:00:00:00:11 local PI hostbond1 1000 0/0 02:02:00:00:00:05 remote 03:00:00:00:00:02:11:00:00:01 0/0 root@torm-11:mgmt:~# root@torm-11:mgmt:~# net show evpn mac vni 1000 mac 00:00:00:00:00:11 MAC: 00:00:00:00:00:11 ESI: 03:00:00:00:00:01:11:00:00:01 Intf: hostbond1(58) VLAN: 1000 Sync-info: neigh#: 0 local-inactive peer-active >>>>>>>>>>>> Local Seq: 0 Remote Seq: 0 Neighbors: No Neighbors root@torm-11:mgmt:~# >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> neigh ===== >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> root@torm-11:mgmt:~# net show evpn arp vni 1003 Number of ARPs (local and remote) known for this VNI: 4 Flags: I=local-inactive, P=peer-active, X=peer-proxy Neighbor Type Flags State MAC Remote ES/VTEP Seq #'s 2001:fee1:0:3::6 local active 00:02:00:00:00:25 0/0 45.0.3.66 local P active 00:02:00:00:00:66 0/0 45.0.3.6 local active 00:02:00:00:00:25 0/0 fe80::202:ff:fe00:25 local active 00:02:00:00:00:25 0/0 root@torm-11:mgmt:~# root@torm-11:mgmt:~# net show evpn arp vni 1003 ip 45.0.3.66 IP: 45.0.3.66 Type: local State: active MAC: 00:02:00:00:00:66 Sync-info: peer-active >>>>>>>>>>>>>>>> Local Seq: 0 Remote Seq: 0 root@torm-11:mgmt:~# >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
* zebra: Ethernet segment management and support for MAC-ECMPAnuradha Karuppiah2020-08-051-0/+228
1. Local ethernet segments are configured in zebra by attaching a local-es-id and sys-mac to a access interface - >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ! interface hostbond1 evpn mh es-id 1 evpn mh es-sys-mac 00:00:00:00:01:11 ! >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> This info is then sent to BGP and used for the generation of EAD-per-ES routes. 2. Access VLANs associated with an (ES) access port are translated into ES-EVI objects and sent to BGP. This is used by BGP for the generation of EAD-EVI routes. 3. Remote ESs are imported by BGP and sent to zebra. A list of VTEPs is maintained per-remote ES in zebra. This list is used for the creation of the L2-NHG that is used for forwarding traffic. 4. MAC entries with a non-zero ESI destination use the L2-NHG associated with the ESI for forwarding traffic over the VxLAN overlay. Please see zebra_evpn_mh.h for the datastruct organization details. Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>