diff options
author | Sarita Patra <saritap@vmware.com> | 2020-08-21 08:33:09 +0200 |
---|---|---|
committer | Sarita Patra <saritap@vmware.com> | 2020-08-21 08:36:22 +0200 |
commit | 6c4d8732e975cf67a815fc97430166bb4d7008aa (patch) | |
tree | 190e5a6189c5036230f7a2f1513dc07ea89d2976 /bgpd/bgp_fsm.c | |
parent | bgpd: Don't stop hold timer in OpenConfirm State (diff) | |
download | frr-6c4d8732e975cf67a815fc97430166bb4d7008aa.tar.xz frr-6c4d8732e975cf67a815fc97430166bb4d7008aa.zip |
bgpd: Fix BGP session stuck in OpenConfirm state
Issue:
1. Initially BGP start listening to socket.
2. Start timer expires and BGP tries to connect to peer and moved
to Idle->connect (lets say peer datastructre X)
3. Connect for X succeeds and hence moved from idle ->connect with
FD-x.
4. A incoming connection is accepted and a new peer datastructure Y
is created with FD-y moves from idle->Active state.
5. Peer datastercture Y FD-y sends out OPEN and moves to
Active->Opensent state.
6. Peer datastrcture Y FD-y receives OPEN and moved from Opensent->
Openconfirm state.
7. Meanwhile on peer datastrcture X FD-x sends out a OPEN message
and moved from connect->Opensent.
8. For peer datastrcture Y FD-y keep alive is received and it is
moved from OpenConfirm->Established.
9. In this case peer datastructure Y FD-y is a accepted connection
so we try to copy all its parameter to peer datastructure X and
delete Y.
10. During this process TCP connection for the accepted connection
(FD-y) goes down and hence get remote address and port fails.
11. With this failure bgp_stop function for both peer datastrure X
and peer datastructure Y is called.
12. By this time all the parameters include state for datastrcture
for X and Y are exchanged. Peer Y FD-y when it entered this
function had state OpenConfirm still which has been moved to peer
datastrcture X.
13. In bgp_stop it will stop all the timers and take action only if
peer is in established state. Now that peer datastrcture X and Y
are not in established state (in this function) it will simply
close all timers and close the socket and assigns socket for both
the peer datastrcture to -1.
14. Peer datastrcture Y will be deleted as it is a datastrcture created
due to accept of connection where as peer datastrcture X will be held
as it is created with configuration.
15. Now peer datastrcture X now holds a state of OpenConfirm without any
timers running.
16. With this any new incoming connection will never be able to establish
as there is config connection X which is stuck in OpenConfirm.
Fix:
While transferring the peer datastructure Y FD-y (accepted connection)
to the peer datastructure X, if TCP connection for FD-y goes down, then
1. Call fsm event bgp_stop for X (do cleanup with bgp_stop and move the
state to Idle) and
2. Call fsm event bgp_stop for Y (do cleanup with bgp_stop and gets deleted
since it is an accept connection).
Signed-off-by: Sarita Patra <saritap@vmware.com>
Diffstat (limited to '')
-rw-r--r-- | bgpd/bgp_fsm.c | 4 |
1 files changed, 2 insertions, 2 deletions
diff --git a/bgpd/bgp_fsm.c b/bgpd/bgp_fsm.c index 357061b44..c8e5a308e 100644 --- a/bgpd/bgp_fsm.c +++ b/bgpd/bgp_fsm.c @@ -304,8 +304,8 @@ static struct peer *peer_xfer_conn(struct peer *from_peer) ? "accept" : ""), peer->host, peer->fd, from_peer->fd); - bgp_stop(peer); - bgp_stop(from_peer); + BGP_EVENT_ADD(peer, BGP_Stop); + BGP_EVENT_ADD(from_peer, BGP_Stop); return NULL; } if (from_peer->status > Active) { |