| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
6vPE enables the announcement of IPv6 VPN prefixes through an IPv4 BGP
session. In this scenario, the next hop addresses for these prefixes are
represented in an IPv4-mapped IPv6 format, noted as ::ffff:[IPv4]. This
format indicates to the peer that it should route these IPv6 addresses
using information from the IPv4 nexthop. For example:
> Path Attribute - MP_REACH_NLRI
> [...]
> Address family identifier (AFI): IPv6 (2)
> Subsequent address family identifier (SAFI): Labeled VPN Unicast (128)
> Next hop: RD=0:0 IPv6=::ffff:192.0.2.5 RD=0:0 Link-local=fe80::501d:42ff:feef:b021
> Number of Subnetwork points of attachment (SNPA): 0
This rule is set out in RFC4798:
> The IPv4 address of the egress 6PE router MUST be encoded as an
> IPv4-mapped IPv6 address in the BGP Next Hop field.
However, in some situations, bgpd sends a standard nexthop IPv6 address
instead of an IPv4-mapped IPv6 address because the outgoing interface for
the BGP session has a valid IPv6 address. This is problematic because
the peer router may not be able to route the nexthop IPv6 address (ie.
if the outgoing interface has not IPv6).
Fix the issue by always sending a IPv4-mapped IPv6 address as nexthop
when the BGP session is on IPv4 and address family IPv6.
Link: https://datatracker.ietf.org/doc/html/rfc4798#section-2
Fixes: 92d6f76 ("lib,zebra,bgpd: Fix for nexthop as IPv4 mapped IPv6 address")
Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
RFC 3032 defines:
A value of 2 represents the "IPv6 Explicit NULL Label".
This label value is only legal at the bottom of the label
stack. It indicates that the label stack must be popped,
and the forwarding of the packet must then be based on the
IPv6 header.
Before this patch we set 128, but it was even more wrong, because it was sent
in host-byte order, not the network-byte.
Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The advertised label value from mpls vpn routes is not modified
when the advertised next-hop is modified to next-hop-self.
Actually, the original label value received is redistributed as
is, whereas the new_label value bound in the nexthop label
bind entry should be used.
Only the VPN entries that contain MPLS information, and that
are redistributed between distinct peers, will have a label
value to advertise.
- no SRv6 attribute
- no local prefix
- no exported VPN prefixes from a VRF
If the advertisement to a given peer has the next-hop modified,
then the new label value will be picked up. The considered cases
are peers configured with 'next-hop-self' option, or ebgp peerings
without the 'next-hop-unchanged' option.
Note that the the NLRI format will follow the rfc3107 format, as
multiple label values for MPLS VPN NLRIs are not supported (the
rfc8277 is not supported).
Note also that the case where an outgoing route-map is applied to
the outgoing neighbor is not considered in this commit.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
|
|
|
|
|
|
|
| |
We should probably prevent any type of namespace collision
with something else.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
|
|
|
|
|
|
|
|
|
|
|
| |
This is a first in a series of commits, whose goal is to rename
the thread system in FRR to an event system. There is a continual
problem where people are confusing `struct thread` with a true
pthread. In reality, our entire thread.c is an event system.
In this commit rename the thread.[ch] files to event.[ch].
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
|
|
|
|
|
|
| |
Done with a combination of regex'ing and banging my head against a wall.
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Without this patch, this just crashes:
```
(gdb) bt
0 raise (sig=sig@entry=11) at ../sysdeps/unix/sysv/linux/raise.c:51
1 0x00007f66d977b10c in core_handler (signo=11, siginfo=0x7ffd87aa0430, context=<optimized out>) at lib/sigevent.c:261
2 <signal handler called>
3 stream_put_labeled_prefix (s=s@entry=0x55bd3ce53050, p=p@entry=0x7ffd87aa0a20, label=label@entry=0x0, addpath_capable=<optimized out>, addpath_tx_id=addpath_tx_id@entry=1)
at lib/stream.c:1057
4 0x000055bd3bfba176 in bgp_packet_mpattr_prefix (s=s@entry=0x55bd3ce53050, afi=afi@entry=AFI_IP, safi=safi@entry=SAFI_LABELED_UNICAST, p=p@entry=0x7ffd87aa0a20, prd=prd@entry=0x0,
label=label@entry=0x0, num_labels=0, addpath_capable=true, addpath_tx_id=1, attr=0x7ffd87aa2c20) at bgpd/bgp_attr.c:3964
5 0x000055bd3bfba2b5 in bgp_packet_attribute (bgp=0x55bd3cd8e470, bgp@entry=0x0, peer=peer@entry=0x55bd3cf21fc0, s=s@entry=0x55bd3ce53050, attr=attr@entry=0x7ffd87aa2c20,
vecarr=vecarr@entry=0x7ffd87aa0a10, p=p@entry=0x7ffd87aa0a20, afi=AFI_IP, safi=SAFI_LABELED_UNICAST, from=0x7f66d9ba9010, prd=0x0, label=0x0, num_labels=0, addpath_capable=true,
addpath_tx_id=1, bpi=0x0) at bgpd/bgp_attr.c:4139
6 0x000055bd3c04d455 in subgroup_default_update_packet (subgrp=subgrp@entry=0x55bd3cd885b0, attr=attr@entry=0x7ffd87aa2c20, from=from@entry=0x7f66d9ba9010) at bgpd/bgp_updgrp_packet.c:1129
7 0x000055bd3c04a9a5 in subgroup_default_originate (subgrp=0x55bd3cd885b0, withdraw=withdraw@entry=0) at bgpd/bgp_updgrp_adv.c:972
8 0x000055bd3c022668 in bgp_default_originate (peer=peer@entry=0x7f66d574a010, afi=afi@entry=AFI_IP, safi=safi@entry=SAFI_LABELED_UNICAST, withdraw=withdraw@entry=0)
at bgpd/bgp_route.c:5037
9 0x000055bd3c0922e0 in peer_default_originate_set (peer=0x7f66d574a010, afi=afi@entry=AFI_IP, safi=safi@entry=SAFI_LABELED_UNICAST, rmap=rmap@entry=0x0, route_map=route_map@entry=0x0)
at bgpd/bgpd.c:5428
10 0x000055bd3c076c07 in peer_default_originate_set_vty (set=1, rmap=0x0, safi=SAFI_LABELED_UNICAST, afi=AFI_IP, peer_str=<optimized out>, vty=0x55bd3ce4c900) at bgpd/bgp_vty.c:6941
11 neighbor_default_originate (self=<optimized out>, vty=0x55bd3ce4c900, argc=<optimized out>, argv=<optimized out>) at bgpd/bgp_vty.c:6958
12 0x00007f66d9721dc0 in cmd_execute_command_real (vline=vline@entry=0x55bd3cd874d0, vty=vty@entry=0x55bd3ce4c900, cmd=cmd@entry=0x0, up_level=up_level@entry=0, filter=FILTER_RELAXED)
at lib/command.c:996
13 0x00007f66d9721f39 in cmd_execute_command (vline=vline@entry=0x55bd3cd874d0, vty=vty@entry=0x55bd3ce4c900, cmd=cmd@entry=0x0, vtysh=vtysh@entry=0) at lib/command.c:1055
14 0x00007f66d9722162 in cmd_execute (vty=vty@entry=0x55bd3ce4c900, cmd=cmd@entry=0x55bd3cd8a230 "neighbor 192.168.34.4 default-originate ", matched=matched@entry=0x0, vtysh=vtysh@entry=0)
at lib/command.c:1223
15 0x00007f66d9792337 in vty_command (vty=vty@entry=0x55bd3ce4c900, buf=<optimized out>) at lib/vty.c:486
16 0x00007f66d9792570 in vty_execute (vty=0x55bd3ce4c900) at lib/vty.c:1249
```
Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Just do not reset next-hop for MPLS VPN routes.
Example of 172.16.255.1/32 (using extended next-hop capability):
```
pe2# sh bgp ipv4 vpn
BGP table version is 4, local router ID is 10.10.10.20, vrf id 0
Default local pref 100, local AS 65001
Status codes: s suppressed, d damped, h history, * valid, > best, = multipath,
i internal, r RIB-failure, S Stale, R Removed
Nexthop codes: @NNN nexthop's vrf id, < announce-nh-self
Origin codes: i - IGP, e - EGP, ? - incomplete
RPKI validation codes: V valid, I invalid, N Not found
Network Next Hop Metric LocPrf Weight Path
Route Distinguisher: 192.168.1.2:2
*>i10.0.0.0/24 2001:db8:1::1 0 100 0 65000 ?
UN=2001:db8:1::1 EC{192.168.1.2:2} label=1111 type=bgp, subtype=0
*>i172.16.255.1/32 2001:db8::1 0 100 0 65000 ?
UN=2001:db8::1 EC{192.168.1.2:2} label=1111 type=bgp, subtype=0
*>i192.168.1.0/24 2001:db8:1::1 0 100 0 65000 ?
UN=2001:db8:1::1 EC{192.168.1.2:2} label=1111 type=bgp, subtype=0
*>i192.168.2.0/24 2001:db8:1::1 100 0 65000 ?
UN=2001:db8:1::1 EC{192.168.1.2:2} label=1111 type=bgp, subtype=0
Route Distinguisher: 192.168.2.2:2
*> 10.0.0.0/24 192.168.2.1@4< 0 50 0 65000 ?
UN=192.168.2.1 EC{192.168.2.2:2} label=2222 type=bgp, subtype=5
*> 172.16.255.1/32 192.168.2.1@4< 50 0 65000 ?
UN=192.168.2.1 EC{192.168.2.2:2} label=2222 type=bgp, subtype=5
*> 192.168.1.0/24 192.168.2.1@4< 50 0 65000 ?
UN=192.168.2.1 EC{192.168.2.2:2} label=2222 type=bgp, subtype=5
*> 192.168.2.0/24 192.168.2.1@4< 0 50 0 65000 ?
UN=192.168.2.1 EC{192.168.2.2:2} label=2222 type=bgp, subtype=5
Displayed 8 routes and 8 total paths
```
Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
|
|\
| |
| | |
bgpd: Implement AIGP
|
| |
| |
| |
| |
| |
| | |
https://www.rfc-editor.org/rfc/rfc7311.html
Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
|
|/
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Given that the following topology, route server MUST not modify NEXT_HOP
attribute because route server isn't in the actual routing path. This
behavior is required to comply RFC7947
(Router A) <-(eBGP peer)-> (Route Server) <-(eBGP peer)-> (Router B)
RFC7947 says as follows:
> As the route server does not participate in the actual routing of
> traffic, the NEXT_HOP attribute MUST be passed unmodified to the route
> server clients, similar to the "third-party" next-hop
> feature described in Section 5.1.3. of [RFC4271].
However, current FRR is violating RFC7947 in some cases. If routers and
route server established BGP peer over IPv6 connection and routers
advertise ipv4-vpn routes through route server, route server will modify
NEXT_HOP attribute in these advertisements.
This is because the condition to check whether NEXT_HOP attribute should
be changed or not is wrong. We should use (afi, safi) as the key to
check, but (nhafi, safi) is actually used. This causes the RFC7947
violation.
Signed-off-by: Ryoga Saito <ryoga.saito@linecorp.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Description:
Change is intended for fixing the inconsistencies present
while adjusting the SNT counters with default originate.
- SNT counter gets incremented on every change of policy associated
with default-originate, leading to inconsistencies.
- This fix has been added to ensure that the SNT counters gets
incremented and decremented only once during the creation and
deletion workflow of default-originate, and prevents
incrementing the counter during update flow.
Co-authored-by: Abhinay Ramesh <rabhinay@vmware.com>
Signed-off-by: Iqra Siddiqui <imujeebsiddi@vmware.com>
|
|
|
|
| |
Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
|
|
|
|
|
|
| |
Rename addpath_encode[d] to addpath_capable to be consistent.
Signed-off-by: Donatas Abraitis <donatas.abraitis@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Abstract:
- The command "neighbor PEER maximum-prefix-out NUMBER" cannot be applied
without clearing the BGP neighbor.
- Apply the maximum-prefix-out value as soon as it is modified without
clearing the neighbor.
subgroup_update_packet() and subgroup_withdraw_packet() respectively
manages the announcement and withdrawal BGP message to the peer.
subgrp->scount counter counts the number of sent prefixes.
Before the patch, the maximum out prefix limitation was applied in
subgroup_update_packet() in order that subgrp->scount never exceeds the
limit. Setting a limit inferior to the effective number of sent prefix
did not result in sending any withdrawal message to reduce the number of
sent prefixes. Without clearing the BGP neighbor, the limitation only
applied to the announcement of new prefixes when the limitation was
over.
With the patch, the limitation is checked in subgroup_announce_check().
The function is intended to say whether a prefix has to be announced in
regards to the prefix-list, route-map... Now when a maximum-prefix-out
value is changed/removed, the neighbor AFI/SAFI table is re-parsed in
the same way as for the application of route-map, prefix-lists...
Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
|
|
|
|
|
|
| |
Before we didn't count default-originate to pfxSnt.
Signed-off-by: Donatas Abraitis <donatas.abraitis@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Gateway IP overlay index is generated for EVPN RT-5 when following CLI is
configured.
router bgp 100 vrf vrf-blue
address-family l2vpn evpn
advertise ipv4 unicast gateway-ip
advertise ipv6 unicast gateway-ip
BGP nexthop of the VRF IP/IPv6 route is set as the gateway IP of the
corresponding EVPN RT-5
Signed-off-by: Ameya Dharkar <adharkar@vmware.com>
|
|
|
|
| |
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Implement https://www.rfc-editor.org/rfc/rfc8654.txt
```
> | jq '."192.168.10.25".neighborCapabilities.extendedMessage'
"advertisedAndReceived"
```
Another side is Bird:
```
BIRD 2.0.7 ready.
Name Proto Table State Since Info
v4 BGP --- up 19:39:15.689 Established
BGP state: Established
Neighbor address: 192.168.10.123
Neighbor AS: 65534
Local AS: 65025
Neighbor ID: 192.168.100.1
Local capabilities
Multiprotocol
AF announced: ipv4
Route refresh
Extended message
Graceful restart
4-octet AS numbers
Enhanced refresh
Long-lived graceful restart
Neighbor capabilities
Multiprotocol
AF announced: ipv4
Route refresh
Extended message
Graceful restart
4-octet AS numbers
ADD-PATH
RX: ipv4
TX:
Enhanced refresh
Session: external AS4
Source address: 192.168.10.25
Hold timer: 140.139/180
Keepalive timer: 9.484/60
Channel ipv4
State: UP
Table: master4
Preference: 100
Input filter: ACCEPT
Output filter: ACCEPT
Routes: 9 imported, 3 exported, 8 preferred
Route change stats: received rejected filtered ignored accepted
Import updates: 9 0 0 0 9
Import withdraws: 2 0 --- 2 0
Export updates: 11 8 0 --- 3
Export withdraws: 0 --- --- --- 0
BGP Next hop: 192.168.10.25
```
Tested at least as well with to make sure it works with backward compat.:
ExaBGP 4.0.2-1c737d99.
Arista vEOS 4.21.14M
Testing by injecint 10k routes with:
```
sharp install routes 172.16.0.1 nexthop 192.168.10.123 10000
```
Before extended message support:
```
2021/03/01 07:18:51 BGP: u1:s1 send UPDATE len 4096 (max message len: 4096) numpfx 809
2021/03/01 07:18:51 BGP: u1:s1 send UPDATE len 4096 (max message len: 4096) numpfx 809
2021/03/01 07:18:51 BGP: u1:s1 send UPDATE len 4096 (max message len: 4096) numpfx 809
2021/03/01 07:18:51 BGP: u1:s1 send UPDATE len 4096 (max message len: 4096) numpfx 809
2021/03/01 07:18:51 BGP: u1:s1 send UPDATE len 4096 (max message len: 4096) numpfx 809
2021/03/01 07:18:51 BGP: u1:s1 send UPDATE len 4096 (max message len: 4096) numpfx 809
2021/03/01 07:18:52 BGP: u1:s1 send UPDATE len 4096 (max message len: 4096) numpfx 809
2021/03/01 07:18:52 BGP: u1:s1 send UPDATE len 4096 (max message len: 4096) numpfx 809
2021/03/01 07:18:52 BGP: u1:s1 send UPDATE len 4096 (max message len: 4096) numpfx 809
2021/03/01 07:18:52 BGP: u1:s1 send UPDATE len 4096 (max message len: 4096) numpfx 809
2021/03/01 07:18:52 BGP: u1:s1 send UPDATE len 4096 (max message len: 4096) numpfx 809
2021/03/01 07:18:52 BGP: u1:s1 send UPDATE len 2186 (max message len: 4096) numpfx 427
2021/03/01 07:18:53 BGP: u1:s1 send UPDATE len 3421 (max message len: 4096) numpfx 674
```
After extended message support:
```
2021/03/01 07:20:11 BGP: u1:s1 send UPDATE len 50051 (max message len: 65535) numpfx 10000
```
Signed-off-by: Donatas Abraitis <donatas.abraitis@gmail.com>
|
|
|
|
|
|
|
|
|
|
| |
When FRR creates a adj_out data structure we lock the `struct
bgp_dest` node associated with it. On freeing of this data
structure and removing the lock it was not associated with
the actual free of the adjacency structure. Let's clean up
the lock/unlock to be centralized to the alloc/free of the adj_out.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
|
|
|
|
|
|
| |
Remove all dead #if 0 code from bgpd.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
|
|
|
|
| |
Signed-off-by: Donatas Abraitis <donatas.abraitis@gmail.com>
|
|
|
|
| |
Signed-off-by: Donatas Abraitis <donatas.abraitis@gmail.com>
|
|
|
|
| |
Signed-off-by: Donatas Abraitis <donatas.abraitis@gmail.com>
|
|
|
|
| |
Signed-off-by: Donatas Abraitis <donatas.abraitis@gmail.com>
|
|\
| |
| |
| |
| | |
ton31337/fix/replace_sizeof_instead_of_constant_for_bgp_dump_attr
bgpd: Use sizeof() in bgp_dump_attr()
|
| |
| |
| |
| | |
Signed-off-by: Donatas Abraitis <donatas.abraitis@gmail.com>
|
|/
|
|
|
|
|
|
|
|
| |
Added a macro to validate the v4 mapped v6 address.
Modified bgp receive & send updates for v4 mapped v6 address as
nexthop and installing it as recursive nexthop in RIB.
Minor change in fpm while sending the routes for nexthop as
v4 mapped v6 address.
Signed-off-by: Kaushik <kaushik@niralnetworks.com>
|
|
|
|
|
|
|
|
|
|
|
| |
These are completely pointless and break coccinelle string replacements.
Scripted commit, idempotent to running:
```
python3 tools/stringmangle.py --pri8-16-32 `git ls-files | egrep '\.[ch]$'`
```
Signed-off-by: David Lamparter <equinox@diac24.net>
|
|
|
|
|
|
|
|
|
| |
This is the bulk part extracted from "bgpd: Convert from `struct
bgp_node` to `struct bgp_dest`". It should not result in any functional
change.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
|
|
|
|
| |
Signed-off-by: Donatas Abraitis <donatas.abraitis@gmail.com>
|
|
|
|
|
|
|
|
|
| |
Add new function `bgp_node_get_prefix()` and modify
the bgp code base to use it.
This is prep work for the struct bgp_dest rework.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
|
|
|
|
|
|
|
| |
Some were converted to bool, where true/false status is needed.
Converted to void only those, where the return status was only false or true.
Signed-off-by: Donatas Abraitis <donatas.abraitis@gmail.com>
|
|
|
|
|
|
| |
No need to check if the variable is NULL and return after assert.
Signed-off-by: Donatas Abraitis <donatas.abraitis@gmail.com>
|
|
|
|
|
|
|
| |
This function was heavily indented, reformat to reduce indentation
levels a bit.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
|
|
|
|
| |
Signed-off-by: Donatas Abraitis <donatas.abraitis@gmail.com>
|
|
|
|
| |
Signed-off-by: Donatas Abraitis <donatas.abraitis@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem reported that when a "neighbor x.x.x.x route-map FOO in"
set a next-hop value, that modified next-hop value was also sent
to eBGP peers. This is incorrect since bgp is expected to set
next-hop to self when sending to eBGP peers unless third party
next-hop on a shared segment is true. This fix modifies the
behavior to stop sending the modified next-hop to eBGP peers
if the route-map was applied inbound on another peer.
Ticket: CM-26025
Signed-off-by: Don Slice <dslice@cumulusnetworks.com>
|
|\ |
|
| |
| |
| |
| |
| |
| |
| | |
The FIFO_* stuff in lib/fifo.h is no different from a simple unsorted
list. Just use DECLARE_LIST here so we can get rid of FIFO_*.
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
|
|/
|
|
|
|
|
|
| |
Prevent the ebgp sender from changing the nexthop( which is same as the ebgp neighbour ipv6 address),
while sending updates to its ipv6 neighbor.So,if the nexthop of the ipv6 route is same as the ipv6
neighbour address do not change the next hop to your own ip.
Signed-off-by: Biswajit Sadhu <sadhub@vmware.com>
|
|
|
|
|
|
| |
No cast necessary for void *
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
|
|
|
|
| |
Signed-off-by: Ruben Kerkhof <ruben@rubenkerkhof.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The motivation for this patch is to address a concerning behavior of
tx-addpath-bestpath-per-AS. Prior to this patch, all paths' TX ID was
pre-determined as the path was received from a peer. However, this meant
that any time the path selected as best from an AS changed, bgpd had no
choice but to withdraw the previous best path, and advertise the new
best-path under a new TX ID. This could cause significant network
disruption, especially for the subset of prefixes coming from only one
AS that were also communicated over a bestpath-per-AS session.
The patch's general approach is best illustrated by
txaddpath_update_ids. After a bestpath run (required for best-per-AS to
know what will and will not be sent as addpaths) ID numbers will be
stripped from paths that no longer need to be sent, and held in a pool.
Then, paths that will be sent as addpaths and do not already have ID
numbers will allocate new ID numbers, pulling first from that pool.
Finally, anything left in the pool will be returned to the allocator.
In order for this to work, ID numbers had to be split by strategy. The
tx-addpath-All strategy would keep every ID number "in use" constantly,
preventing IDs from being transferred to different paths. Rather than
create two variables for ID, this patch create a more generic array that
will easily enable more addpath strategies to be implemented. The
previously described ID manipulations will happen per addpath strategy,
and will only be run for strategies that are enabled on at least one
peer.
Finally, the ID numbers are allocated from an allocator that tracks per
AFI/SAFI/Addpath Strategy which IDs are in use. Though it would be very
improbable, there was the possibility with the free-running counter
approach for rollover to cause two paths on the same prefix to get
assigned the same TX ID. As remote as the possibility is, we prefer to
not leave it to chance.
This ID re-use method is not perfect. In some cases you could still get
withdraw-then-add behaviors where not strictly necessary. In the case of
bestpath-per-AS this requires one AS to advertise a prefix for the first
time, then a second AS withdraws that prefix, all within the space of an
already pending MRAI timer. In those situations a withdraw-then-add is
more forgivable, and fixing it would probably require a much more
significant effort, as IDs would need to be moved to ADVs instead of
paths.
Signed-off-by Mitchell Skiba <mskiba@amazon.com>
|
|
|
|
|
|
|
| |
The tx_id_buf was not being set to anything in some cases,
make sure it's a null string before using.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
|
|
|
|
|
|
| |
Convert the binfo variable to path.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Do a straight conversion of `struct bgp_info` to `struct bgp_path_info`.
This commit will setup the rename of variables as well.
This is being done because `struct bgp_info` is not descriptive
of what this data actually is. It is path information for routes
that we keep to build the actual routes nexthops plus some extra
information.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
|
|
|
|
| |
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
|
|
|
|
| |
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
|
|
|
|
| |
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
|