Introduction to BGP Analysis using Batfish

Network engineers routinely need to validate BGP configuration and session status in the network. They often do that by connecting to multiple network devices and executing a series of show ip bgp commands. This distributed debugging is highly complex even in a moderately-sized network. And it is reactive, the configuration changes are already in the network.

Batfish allows network engineers to proactively validate BGP configuration to ensure sessions are compatibly configured and will be established, thereby avoiding potential network outages.

In this notebook, we will look at how you can extract BGP configuration and session status information from Batfish.

[1]:
# Import packages
%run startup.py
bf = Session(host="localhost")

Initializing the Network and Snapshot

SNAPSHOT_PATH below can be updated to point to a custom snapshot directory, see the Batfish instructions for how to package data for analysis. More example networks are available in the networks folder of the Batfish repository.

[2]:
# Initialize a network and snapshot
NETWORK_NAME = "example_network"
SNAPSHOT_NAME = "example_snapshot"

SNAPSHOT_PATH = "networks/example-bgp"

bf.set_network(NETWORK_NAME)
bf.init_snapshot(SNAPSHOT_PATH, name=SNAPSHOT_NAME, overwrite=True)
[2]:
'example_snapshot'

The network snapshot that we initialized above is illustrated below. You can download/view devices’ configuration files here.

example-bgp-network

All of the information we will show you in this notebook is dynamically computed by Batfish based on the configuration files for the network devices.

View BGP Configuration for ALL devices

Batfish makes BGP Configuration settings in the network easily accessible. Let’s take a look at how you can retrieve the specific information you want. Let’s start with configuration attributes of the BGP process on all devices running BGP by using the question bf.q.bgpProcessConfiguration.

[3]:
# Get BGP process configuration information for ALL devices
bgp_config = bf.q.bgpProcessConfiguration().answer().frame()
bgp_config
[3]:
Node VRF Router_ID Confederation_ID Confederation_Members Multipath_EBGP Multipath_IBGP Multipath_Match_Mode Neighbors Route_Reflector Tie_Breaker
0 as2border2 default 2.1.1.2 None None True True EXACT_PATH ['2.1.2.1', '2.1.2.2', '10.23.21.3'] False ARRIVAL_ORDER
1 as3core1 default 3.10.1.1 None None True True EXACT_PATH ['3.1.1.1', '3.2.2.2'] True ARRIVAL_ORDER
2 as2border1 default 2.1.1.1 None None True True EXACT_PATH ['2.1.2.1', '2.1.2.2', '10.12.11.1'] False ARRIVAL_ORDER
3 as1core1 default 1.10.1.1 None None True True EXACT_PATH ['1.1.1.1', '1.2.2.2'] True ARRIVAL_ORDER
4 as1border1 default 1.1.1.1 None None True True EXACT_PATH ['1.10.1.1', '3.2.2.2', '5.6.7.8', '10.12.11.2'] False ARRIVAL_ORDER
5 as2core1 default 2.1.2.1 None None True True EXACT_PATH ['2.1.1.1', '2.1.1.2', '2.1.3.1', '2.1.3.2'] True ARRIVAL_ORDER
6 as3border2 default 3.2.2.2 None None True True EXACT_PATH ['3.10.1.1', '10.13.22.1'] False ARRIVAL_ORDER
7 as2core2 default 2.1.2.2 None None True True EXACT_PATH ['2.1.1.1', '2.1.1.2', '2.1.3.1', '2.1.3.2'] True ARRIVAL_ORDER
8 as1border2 default 1.2.2.2 None None True True EXACT_PATH ['1.10.1.1', '10.13.22.3', '10.14.22.4'] False ARRIVAL_ORDER
9 as2dept1 default 2.1.4.1 None None True True EXACT_PATH ['2.34.101.3', '2.34.201.3'] False ARRIVAL_ORDER
10 as3border1 default 3.1.1.1 None None True True EXACT_PATH ['3.10.1.1', '10.23.21.2'] False ARRIVAL_ORDER
11 as2dist2 default 2.1.3.2 None None True True EXACT_PATH ['2.1.2.1', '2.1.2.2', '2.34.0.0/16'] False ARRIVAL_ORDER
12 as2dist1 default 2.1.3.1 None None True True EXACT_PATH ['2.1.2.1', '2.1.2.2', '2.34.0.0/16'] False ARRIVAL_ORDER

Now let’s drill into the configuration of a specific BGP session. Let’s look at the sessions on as2dept1. To do this, we will use the bf.q.bgpPeerConfiguration question.

[4]:
# Get all of the BGP peer configuration for as2dept1 devices
bgp_peer_config = bf.q.bgpPeerConfiguration(nodes='as2dept1').answer().frame()
bgp_peer_config
[4]:
Node VRF Local_AS Local_IP Local_Interface Confederation Remote_AS Remote_IP Description Route_Reflector_Client Cluster_ID Peer_Group Import_Policy Export_Policy Send_Community Is_Passive
0 as2dept1 default 65001 2.34.101.4 None None 2 2.34.101.3 None False None as2 ['as2_to_dept'] ['dept_to_as2'] True False
1 as2dept1 default 65001 2.34.201.4 None None 2 2.34.201.3 None False None as2 ['as2_to_dept'] ['dept_to_as2'] True False

View BGP Session Status

Now that we have seen the configuration for each peer on as2dept1, let’s ensure that their configuration is compatible with their peers

The bgpSessionCompatibility question allows you to ensure that BGP sessions are compatibly configured, so that if there is IP reachability between the peers the sessions will be established. Compatiblity checks that the remote-as matches up on both ends, the correct update source is specified, peer-ip addresses match-up, etc…

[5]:
# Check if the bgp Sessions on as2dept1 are properly configured
bgpSessCompat = bf.q.bgpSessionCompatibility(nodes='as2dept1').answer().frame()
bgpSessCompat
[5]:
Node VRF Local_AS Local_Interface Local_IP Remote_AS Remote_Node Remote_Interface Remote_IP Address_Families Session_Type Configured_Status
0 as2dept1 default 65001 None 2.34.101.4 2 as2dist1 None 2.34.101.3 ['IPV4_UNICAST'] EBGP_SINGLEHOP UNIQUE_MATCH
1 as2dept1 default 65001 None 2.34.201.4 2 as2dist2 None 2.34.201.3 ['IPV4_UNICAST'] EBGP_SINGLEHOP UNIQUE_MATCH

Both of the configured BGP peers on as2dept1 are compatible. We know that since the Configured_Status is UNIQUE_MATCH. So now let’s check if they are established.

[6]:
# Check if the bgp Sessions on as2dept1 are ESTABLISHED
bgpSessStat = bf.q.bgpSessionStatus(nodes='as2dept1').answer().frame()
bgpSessStat
[6]:
Node VRF Local_AS Local_Interface Local_IP Remote_AS Remote_Node Remote_Interface Remote_IP Address_Families Session_Type Established_Status
0 as2dept1 default 65001 None 2.34.101.4 2 as2dist1 None 2.34.101.3 ['IPV4_UNICAST'] EBGP_SINGLEHOP ESTABLISHED
1 as2dept1 default 65001 None 2.34.201.4 2 as2dist2 None 2.34.201.3 ['IPV4_UNICAST'] EBGP_SINGLEHOP ESTABLISHED

Both sessions are established. Let’s see if there are any configured BGP sessions, on any other device in the network, that are not established

[7]:
# Find any BGP sessions in the network that are NOT ESTABLISHED
bgpSessStat = bf.q.bgpSessionStatus().answer().frame()
bgpSessStat[bgpSessStat['Established_Status'] != 'ESTABLISHED']
[7]:
Node VRF Local_AS Local_Interface Local_IP Remote_AS Remote_Node Remote_Interface Remote_IP Address_Families Session_Type Established_Status
1 as1border1 default 1 None None 666 None None 3.2.2.2 [] EBGP_SINGLEHOP NOT_COMPATIBLE
2 as1border1 default 1 None None 555 None None 5.6.7.8 [] EBGP_SINGLEHOP NOT_COMPATIBLE
6 as1border2 default 1 None 10.14.22.1 4 None None 10.14.22.4 [] EBGP_SINGLEHOP NOT_COMPATIBLE
9 as2border1 default 2 None 2.1.1.1 2 as2core1 None 2.1.2.1 ['IPV4_UNICAST'] IBGP NOT_ESTABLISHED
12 as2border2 default 2 None 2.1.1.2 2 as2core1 None 2.1.2.1 ['IPV4_UNICAST'] IBGP NOT_ESTABLISHED
15 as2core1 default 2 None 2.1.2.1 2 as2border1 None 2.1.1.1 ['IPV4_UNICAST'] IBGP NOT_ESTABLISHED
16 as2core1 default 2 None 2.1.2.1 2 as2border2 None 2.1.1.2 ['IPV4_UNICAST'] IBGP NOT_ESTABLISHED
17 as2core1 default 2 None 2.1.2.1 2 as2dist1 None 2.1.3.1 ['IPV4_UNICAST'] IBGP NOT_ESTABLISHED
18 as2core1 default 2 None 2.1.2.1 2 as2dist2 None 2.1.3.2 ['IPV4_UNICAST'] IBGP NOT_ESTABLISHED
25 as2dist1 default 2 None 2.1.3.1 2 as2core1 None 2.1.2.1 ['IPV4_UNICAST'] IBGP NOT_ESTABLISHED
28 as2dist2 default 2 None 2.1.3.2 2 as2core1 None 2.1.2.1 ['IPV4_UNICAST'] IBGP NOT_ESTABLISHED

Looking at the Established_Status column we see that there are a lot of sessions that are configured, but not established. Let’s dig into these issues.

First, let’s find all of the sessions that are not compatibly configured. For that we are looking for sessions which are not either a UNIQUE_MATCH or DYNAMIC_MATCH. The latter is for dynamic peers (these are peers configured to use a listen-range).

Debug NOT_COMPATIBLE sessions

[8]:
# Find BGP sessions that are not compatibly configured - i.e Batfish does not identify them as being a UNIQUE_MATCH or a DYNAMIC_MATCH
bgpSessCompat = bf.q.bgpSessionCompatibility().answer().frame()
bgpSessCompat[~bgpSessCompat['Configured_Status'].isin(["UNIQUE_MATCH", "DYNAMIC_MATCH"])]
[8]:
Node VRF Local_AS Local_Interface Local_IP Remote_AS Remote_Node Remote_Interface Remote_IP Address_Families Session_Type Configured_Status
1 as1border1 default 1 None None 666 None None 3.2.2.2 [] EBGP_SINGLEHOP NO_LOCAL_IP
2 as1border1 default 1 None None 555 None None 5.6.7.8 [] EBGP_SINGLEHOP NO_LOCAL_IP
6 as1border2 default 1 None 10.14.22.1 4 None None 10.14.22.4 [] EBGP_SINGLEHOP UNKNOWN_REMOTE
13 as2border2 default 2 None None 2 None None 2.1.2.2 [] IBGP LOCAL_IP_UNKNOWN_STATICALLY

We see 4 entries in this table even though we only saw 3 BGP session with thes status NOT_COMPATIBLE. This means that despite not having a Configured_Status of UNIQUE_MATCH or DYNAMIC_MATCH one of these sessions was indeed established. This typically occurs if the mis-configuration is such that the session can ONLY be established when initiated from one side, but not the other. The likely candidate in this output is the as2border2 session to 2.1.2.2 that has status of LOCAL_IP_UNKNOWN_STATICALLY.

Debug UNKNOWN_REMOTE peer on as1border2

Batfish deems the BGP peer 10.14.22.4 is not compatible because it cannot find a device in the snapshot that has that IP address configured on any interface. That is why the status is UNKNOWN_REMOTE. This will occur in most networks, since you will not have the configurations of the devices for your external peers (ISPs, content partners, etc…). We can easily verify this by checking the output of bf.q.ipOwners

[9]:
# check if there is a node in the network that has the `10.14.22.4` on an interface
ipOwn = bf.q.ipOwners().answer().frame()
ipOwn[ipOwn['IP']=='10.14.22.4']
[9]:
Node VRF Interface IP Mask Active

Debug LOCAL_IP_UNKNOWN_STATICALLY on as2border2

Now, let’s dig into the sessions that are LOCAL_IP_UNKNOWN_STATICALLY.

An iBGP session will have the status LOCAL_IP_UNKNOWN_STATICALLY if you are missing the update-source command.

So, in this case, it is likely that the issue is that as2border2 is missing the update-source command for the BGP session. This is needed for iBGP sessions to ensure the peers pick the correct IP address to use when trying to establish the TCP session. We can find out the target remote node by looking at the bf.q.ipOwners output

[10]:
#find the device(s) that own these ip addresses
ipOwn[ipOwn['IP'].isin(['2.1.2.2'])]
[10]:
Node VRF Interface IP Mask Active
22 as2core2 default Loopback0 2.1.2.2 32 True

So this session is supposed to be between as2border2 and as2core2. Let’s see if this session is established.

[11]:
# check if either direction of the session can be established
bgpSessStat[((bgpSessStat['Local_IP']=='2.1.1.2') & (bgpSessStat['Remote_IP']=='2.1.2.2')) | ((bgpSessStat['Local_IP']=='2.1.2.2') & (bgpSessStat['Remote_IP']=='2.1.1.2'))]
[11]:
Node VRF Local_AS Local_Interface Local_IP Remote_AS Remote_Node Remote_Interface Remote_IP Address_Families Session_Type Established_Status
20 as2core2 default 2 None 2.1.2.2 2 as2border2 None 2.1.1.2 ['IPV4_UNICAST'] IBGP ESTABLISHED

This confirms that our theory. We are missing the update-source command on as2border2 which prevents it from initiating the BGP session. But since as2core2 is properly configured, the session is established.

Debug NO_LOCAL_IP on as1border1

[12]:
# Identify the BGP sessions on as1border1 that are not compatible with Configured_Status of NO_LOCAL_IP
bgpSessCompat[(bgpSessCompat['Configured_Status'] == 'NO_LOCAL_IP') & (bgpSessCompat['Node']=='as1border1')]
[12]:
Node VRF Local_AS Local_Interface Local_IP Remote_AS Remote_Node Remote_Interface Remote_IP Address_Families Session_Type Configured_Status
1 as1border1 default 1 None None 666 None None 3.2.2.2 [] EBGP_SINGLEHOP NO_LOCAL_IP
2 as1border1 default 1 None None 555 None None 5.6.7.8 [] EBGP_SINGLEHOP NO_LOCAL_IP

For an EBGP_SINGLEHOP session to have CONFIGURED_STATUS of NO_LOCAL_IP this typically means that no interface exists on the box which is in the same subnet as the BGP peer’s IP address. This can easily be checked by looking at the IP addresses configued on as1border1.

[13]:
# Identify the IP addresses that are owned by as1border1
ipOwn[ipOwn['Node']=='as1border1']
[13]:
Node VRF Interface IP Mask Active
7 as1border1 default GigabitEthernet1/0 10.12.11.1 24 True
14 as1border1 default GigabitEthernet0/0 1.0.1.1 24 True
19 as1border1 default Loopback0 1.1.1.1 32 True

As you can see there are no interfaces with addresses that would be in the same subnet as 3.2.2.2 and 5.6.7.8. There could be many explanations for this:

  1. BGP peers were configured before the physical connection to neighboring routers was up.

  2. The user simply configured the wrong BGP peer address.

  3. The interfaces used to exist but were decommissioned, but the BGP config was not cleaned up at the same time.

  4. The session is meant to be be an eBGP multi-hop session, but the user didn’t add the ebgp multihop configuration option and specify an update-source.

Batfish determines the Local_IP for each BGP session, either based on explicit configuration with updates-source foo (or equivalent non-IOS command) for iBGP sessions or eBGP multi-hop sessions, or by determining the interface the router will use to send packets towards the BGP peer. The latter method requires the route to the peer to be known.

So, if there is no route to the configured peer, or the configured peer does not exist in the snapshot, you will see this status. We can check the output of bf.q.routes and bf.q.ipOwners questions to dig into this

[14]:
# Find owner of IP addresses for the incompatible BGP sessions on as1border`
bad_bgp_peer = ['3.2.2.2', '5.6.7.8']
ipOwn[ipOwn['IP'].isin(bad_bgp_peer)]
[14]:
Node VRF Interface IP Mask Active
10 as3border2 default Loopback0 3.2.2.2 32 True

So we can see that 5.6.7.8 does not exist in the network. This either means there is a mis-configuration, or the device is just not expected to be in the snapshot.

Now let’s dig into the peer 3.2.2.2, which we know is the Loopback interface on as3border2

[15]:
# retrieve the routing table entry for as3border2 loopback0 - 3.2.2.2/32 on as1border1
routes = bf.q.routes(network='3.2.2.2/32').answer().frame()
routes[routes['Node']=='as1border1']
[15]:
Node VRF Network Next_Hop Next_Hop_IP Next_Hop_Interface Protocol Metric Admin_Distance Tag

The specific /32 is not present on as1border1. What about other routers?

[16]:
# retrieve the routing table entry for as3border2 loopback0 - 3.2.2.2/32 on ALL routers in the network
routes
[16]:
Node VRF Network Next_Hop Next_Hop_IP Next_Hop_Interface Protocol Metric Admin_Distance Tag
0 as3border1 default 3.2.2.2/32 interface GigabitEthernet0/0 ip 3.0.1.2 3.0.1.2 GigabitEthernet0/0 ospf 3 110 None
1 as3border2 default 3.2.2.2/32 interface Loopback0 AUTO/NONE(-1l) Loopback0 connected 0 0 None
2 as3core1 default 3.2.2.2/32 interface GigabitEthernet0/0 ip 3.0.2.1 3.0.2.1 GigabitEthernet0/0 ospf 2 110 None

We can see that this route does not leave as3, which is why as1border1 is unable to established the configured BGP session. This session should have been configured as an eBGP multi-hop session with static routes pointing to the appropriate interface and next-hop

Debugging BGP sessions that are NOT_ESTABLISHED

We have root-caused the NOT_COMPATIBLE sessions, now let’s dig into the ones that are NOT_ESTABLISHED.

[17]:
# Find all BGP sessions in the network that were compatible but NOT_ESTABLISHED
bgpSessStat[bgpSessStat['Established_Status'] == 'NOT_ESTABLISHED']
[17]:
Node VRF Local_AS Local_Interface Local_IP Remote_AS Remote_Node Remote_Interface Remote_IP Address_Families Session_Type Established_Status
9 as2border1 default 2 None 2.1.1.1 2 as2core1 None 2.1.2.1 ['IPV4_UNICAST'] IBGP NOT_ESTABLISHED
12 as2border2 default 2 None 2.1.1.2 2 as2core1 None 2.1.2.1 ['IPV4_UNICAST'] IBGP NOT_ESTABLISHED
15 as2core1 default 2 None 2.1.2.1 2 as2border1 None 2.1.1.1 ['IPV4_UNICAST'] IBGP NOT_ESTABLISHED
16 as2core1 default 2 None 2.1.2.1 2 as2border2 None 2.1.1.2 ['IPV4_UNICAST'] IBGP NOT_ESTABLISHED
17 as2core1 default 2 None 2.1.2.1 2 as2dist1 None 2.1.3.1 ['IPV4_UNICAST'] IBGP NOT_ESTABLISHED
18 as2core1 default 2 None 2.1.2.1 2 as2dist2 None 2.1.3.2 ['IPV4_UNICAST'] IBGP NOT_ESTABLISHED
25 as2dist1 default 2 None 2.1.3.1 2 as2core1 None 2.1.2.1 ['IPV4_UNICAST'] IBGP NOT_ESTABLISHED
28 as2dist2 default 2 None 2.1.3.2 2 as2core1 None 2.1.2.1 ['IPV4_UNICAST'] IBGP NOT_ESTABLISHED

The reasons for a session that is compatiable (UNIQUE_MATCH or DYNAMIC_MATCH) to not get established would be 1) missing routes or 2) some ACL in the path blocking traffic. Let’s check the routing tables on as2core1

[18]:
# retrieve routing table for as2core1 and check the route to the BGP peer 2.1.1.1 - as2border1
routes = bf.q.routes(nodes='as2core1').answer().frame()
routes[routes['Network']=='2.1.1.1/32']
[18]:
Node VRF Network Next_Hop Next_Hop_IP Next_Hop_Interface Protocol Metric Admin_Distance Tag
0 as2core1 default 2.1.1.1/32 interface GigabitEthernet0/0 ip 2.12.11.1 2.12.11.1 GigabitEthernet0/0 ospf 2 110 None

As we can see as2core1 has a route to the configured neighbor 2.1.1.1. What about the other routers?

[19]:
# retrieve routing table for as2border1 to check if it has a route to as2core1 loopback0 - 2.1.2.1/32
routes=bf.q.routes(nodes='as2border1').answer().frame()
routes[routes['Network']=='2.1.2.1/32']
[19]:
Node VRF Network Next_Hop Next_Hop_IP Next_Hop_Interface Protocol Metric Admin_Distance Tag

as2border1 does not have a route to the loopback address 2.1.2.1 of as2core1. Let’s also check as2border2.

[20]:
# retrieve routing table for as2border2 to check if it has a route to as2core1 loopback0 - 2.1.2.1/32
routes=bf.q.routes(nodes='as2border2').answer().frame()
routes[routes['Network']=='2.1.2.1/32']
[20]:
Node VRF Network Next_Hop Next_Hop_IP Next_Hop_Interface Protocol Metric Admin_Distance Tag

Neither border router has a route to the loopback of as2core1. Since the loopback is supposed to be distributed via OSPF, next step is to look at the OSPF configuration on as2core1

From the snippet of the configuration of as2core1, we can see that the Loopback address isn’t part of the OSPF process:

interface Loopback0
 ip address 2.1.2.1 255.255.255.255

router ospf 1
 router-id 2.1.2.1
 !network 2.0.0.0 0.255.255.255 area 1
 network 2.12.0.0 0.0.255.255 area 1
 network 2.23.0.0 0.0.255.255 area 1

This explains why the BGP session wasn’t established.

With that we have root-caused all of the BGP sessions that were NOT_ESTABLISHED or NOT_COMPATIBLE.

Summary

Batfish allows you to easily retrieve information about BGP configuration of all devices and peers, as well as status of each peer. With Batfish you can ensure that no change is pushed to the network that would cause a BGP session to not come up.

We hope you found this notebook useful and informative. Future notebooks will dive into more advanced topics like validating routing policy. Stay tuned!

Want to learn more? Come find us on Slack and GitHub