Analyzing BGP Route Policies

Route policies for BGP are complex and error prone, which is why some of the biggest outages in the Internet involve misconfigured route policies that end up leaking routes or accepting routes they shouldn’t (e.g., BGP Leak Causing Internet Outages in Japan and Beyond, How a Tiny Error Shut Off the Internet for Parts of the US, Telia engineer error to blame for massive net outage). While it is often clear to network engineers what the route policy should or should not do (e.g., see MANRS guidelines), ensuring that the route policy implementation is correct is notoriously hard.

In this notebook we show how you can use Batfish to validate your route policies. Batfish’s testRoutePolicies question provides an easy way to test route-policy behavior—given a route, it shows how it is transformed (or denied) by the route policy. Batfish’s searchRoutePolicies actively searches for routes that cause a policy to violate its intent.

To illustrate these capabilities, we’ll use an example network with two border routers, named border1 and border2. Each router has a BGP session with a customer network and a BGP session with a provider network. Our goal in this notebook is to validate the in-bound route policy from the customer, called from_customer, and the out-bound route policy to the provider, called to_provider.

The intent of the from_customer route policy is:

  • filter private addresses

  • only permit routes to known prefixes if they have the correct origin AS

  • tag permitted routes with an appropriate community, and update the local preference

The intent of the to_provider route policy is:

  • advertise all prefixes that we own

  • advertise all customer routes

  • don’t advertise anything else Analytics

We’ll start, as usual, by initializing the example network that we will use in this notebook.

[1]:
# Import packages and load questions
%run startup.py
from pybatfish.datamodel.route import BgpRouteConstraints
load_questions()

# Initialize a network and snapshot
NETWORK_NAME = "example_network"
SNAPSHOT_NAME = "example_snapshot"

SNAPSHOT_PATH = "networks/route-analysis"

bf_set_network(NETWORK_NAME)
bf_init_snapshot(SNAPSHOT_PATH, name=SNAPSHOT_NAME, overwrite=True)
[1]:
'example_snapshot'

Example 1: Filter private addresses in-bound

When peering with external entities, an almost universally-desired policy is to filter out all announcements of the private IP address space. For our network, we’d like to ensure that the two from_customer route policies properly filter such announcements.

Traditionally you might validate this policy through some form testing that involves the production or a lab device. With Batfish’s testRoutePolicies question we can easily test a route policy’s behavior without access to a device.

[2]:
# Create an example route to use for testing
inRoute1 = BgpRoute(network="10.0.0.0/24",
                    originatorIp="4.4.4.4",
                    originType="egp",
                    protocol="bgp")

# Test how our policy treats this route
result = bfq.testRoutePolicies(policies="from_customer",
                             direction="in",
                             inputRoutes=[inRoute1]).answer().frame()
# Pretty print the result
show(result)
Node Policy_Name Input_Route Action Output_Route Difference
0 border1 from_customer Network: 10.0.0.0/24
AS Path: []
Communities: []
Local Preference: 0
Metric: 0
Originator IP: 4.4.4.4
Origin Type: egp
Protocol: bgp
Source Protocol: None
DENY None None
1 border2 from_customer Network: 10.0.0.0/24
AS Path: []
Communities: []
Local Preference: 0
Metric: 0
Originator IP: 4.4.4.4
Origin Type: egp
Protocol: bgp
Source Protocol: None
DENY None None

The first line of code above creates a BgpRoute object that specifies the input route announcement to use for testing, which in this case announces the prefix 10.0.0.0/24, has an originator IP that we arbitrarily chose, and has default values for other parts of the announcement. The second line uses testRoutePolicies to test the behavior of the two from_customer route policies on this announcement.

The output of the question shows the results of the test. In both border routers, the from_customer route policy properly denies this private address.

That result gives us some confidence in our route policies, but it is just a single test. We can run testRoutePolicies on more private addresses to ensure they are denied.

However, how can we be sure that all private addresses are denied by the two in-bound route maps? For that, we will use the searchRoutePolicies question and change our perspective a bit. Instead of testing individual routes, we will ask Batfish to search for a route-policy behavior that violates our intent. If we get one or more results, then we’ve found a bug. If we get no results, then we can be sure that our configurations satisfy the intent, since Batfish explores all possible route-policy behaviors.

[3]:
# Define the space of private addresses
privateIps = ["10.0.0.0/8:8-32",
              "172.16.0.0/28:28-32",
              "192.168.0.0/16:16-32"]

# Specify all route announcements for the private space
inRoutes1 = BgpRouteConstraints(prefix=privateIps)

# Verify that no such announcement is permitted by our policy
result = bfq.searchRoutePolicies(policies="from_customer",
                                 inputConstraints=inRoutes1,
                                 action="permit").answer().frame()
# Pretty print the result
show(result)
Node Policy_Name Input_Route Action Output_Route Difference
0 border2 from_customer Network: 192.168.0.0/32
AS Path: []
Communities: []
Local Preference: 0
Metric: 0
Originator IP: 0.0.0.0
Origin Type: igp
Protocol: bgp
Source Protocol: None
PERMIT Network: 192.168.0.0/32
AS Path: []
Communities: [20:30]
Local Preference: 300
Metric: 0
Originator IP: 0.0.0.0
Origin Type: igp
Protocol: bgp
Source Protocol: None
Communities: [] --> [20:30]
Local Preference: 0 --> 300

The first line above specifies the space of all private IP prefixes. The second line creates a BgpRouteConstraints object, which is like the BgpRoute object we saw earlier but represents a set of announcements rather than a single one. In this case, we are interested in all announcements that announce a prefix in privateIps. Finally, the third line of code uses searchRoutePolicies to search for an announcement in the set inRoutes that is permitted by the from_customer route policy.

There are no results for border1, which means that its from_customer route policy properly filters all private addresses. However, the result for border2 shows that its version of from_customer permits an announcement for the prefix 192.168.0.0/32. The table also shows the route announcement that will be produced by from_customer in this case, along with a “diff” of the input and output announcements.

Inspecting the configurations, we see that both routers deny all announcements for prefixes in the prefix list private-ips. However, the definition of private-ips on border2 accidentally omitted the ge /16 clause, so only applied to /16 prefixes. Relevant parts of the config at border2 are:

ip prefix-list private-ips seq 15 permit 192.168.0.0/16  // <-- missing ge /16

...

route-map from_customer deny 100
 match ip address prefix-list private-ips
!
....

route-map from_customer permit 400
 set community 20:30
 set local-preference 300
!

Batfish is able to correctly model the semantics of route maps and prefix lists and deduce that some prefix with private IPs will get past our policy.

Example 2: Filter based on origin AS in-bound

Another common BGP policy is to make sure that announcements for certain prefixes (e.g., customer-owned prefixes) are acccepted only if they have a specific origin AS.

For our example, we assume that announcements for routes with any prefix in the range 5.5.5.0/24:24-32 should originate from the AS 44. We will use searchRoutePolicies to ask: Is there any permitted announcement for a prefix in the range 5.5.5.0/24:24-32 that does not originate from AS 44?

[4]:
# Define expected prefixes
knownPrefixes = "5.5.5.0/24:24-32"

# Define invalid AS-path -- all those that do not have 44 as the origin AS
badOrigin = "!/( |^)44$/"

# Specify the route announcements we must not permit
inRoutes2 = BgpRouteConstraints(prefix=knownPrefixes, asPath=badOrigin)

# Verify that our policy does not permit any such announcement
result = bfq.searchRoutePolicies(policies="from_customer",
                                 inputConstraints=inRoutes2,
                                 action="permit").answer().frame()
show(result)
Node Policy_Name Input_Route Action Output_Route Difference

The first line above defines the known prefixes. The second line specifies that we are interested in AS-paths that do not end in 44, using the same syntax that Batfish uses for regular-expression specifiers. Specifically, a regular expression is surrounded by / characters, and the leading ! indicates that we are interested in AS-paths that do not match this regular expression. Since there are no results, we can be sure that the our intent is satisfied.

Example 3: Set attributes in-bound

As the third and final policy for inbound routes, let’s make sure that our from_customer route policies tag each permitted route with the community 20:30 and set the local preference to 300. To start, we can use testRoutePolicies to test this property on a specific route announcement.

[5]:
# Define a test route and test what the policy does to it
inRoute3 = BgpRoute(network="2.0.0.0/8",
                    originatorIp="4.4.4.4",
                    originType="egp",
                    protocol="bgp")
result = bfq.testRoutePolicies(policies="from_customer",
                               direction="in",
                               inputRoutes=[inRoute3]).answer().frame()
show(result)
Node Policy_Name Input_Route Action Output_Route Difference
0 border1 from_customer Network: 2.0.0.0/8
AS Path: []
Communities: []
Local Preference: 0
Metric: 0
Originator IP: 4.4.4.4
Origin Type: egp
Protocol: bgp
Source Protocol: None
PERMIT Network: 2.0.0.0/8
AS Path: []
Communities: [20:30]
Local Preference: 0
Metric: 0
Originator IP: 4.4.4.4
Origin Type: egp
Protocol: bgp
Source Protocol: None
Communities: [] --> [20:30]
1 border2 from_customer Network: 2.0.0.0/8
AS Path: []
Communities: []
Local Preference: 0
Metric: 0
Originator IP: 4.4.4.4
Origin Type: egp
Protocol: bgp
Source Protocol: None
PERMIT Network: 2.0.0.0/8
AS Path: []
Communities: [20:30]
Local Preference: 300
Metric: 0
Originator IP: 4.4.4.4
Origin Type: egp
Protocol: bgp
Source Protocol: None
Communities: [] --> [20:30]
Local Preference: 0 --> 300

The results show what each router does to the test route. This information can be seen in the Output_Route column, which shows the full output route announcement, as well as the Difference column, which shows the differences between the input and output route announcements. We see that there is an error in border1’s configuration: the permitted route is tagged with community 20:30, but its local preference is not set to 300. However, border2 is doing the right thing. A look at the configuration for border1 reveals that the set local-preference line is accidentally omitted from one clause of the policy.

This example shows why testing is so important. However, we’d like to make sure that there aren’t any other lurking bugs. We can use searchRoutePolicies for this purpose. First we’ll check that all permitted routes are tagged with the community 20:30. To check this property, we will leverage the ability of searchRoutePolicies not only to search for particular input announcements, but also to search for particular output announcements. In this case, we will ask: Is there a permitted route whose output announcement is not tagged with community 20:30?

[6]:
# Define invalid communities -- those that do not contain 20:30
outRoutes3a = BgpRouteConstraints(communities="!20:30")
# Verify that our policy does not output routes with such communities
result = bfq.searchRoutePolicies(policies="from_customer",
                                 action="permit",
                                 outputConstraints=outRoutes3a).answer().frame()
show(result)
Node Policy_Name Input_Route Action Output_Route Difference

There are no results, so we know that our intent is satisfied on both routers.

Now let’s do a similar thing to check that the local preference is properly set. We’ve already seen that border1 is not properly setting the local preference, so we’ll just check that border2’s configuration is correct.

[7]:
# Verify that all permitted routes have the expected local preference
outRoutes3b = BgpRouteConstraints(localPreference="!300")
result = bfq.searchRoutePolicies(nodes="border2",
                                 policies="from_customer",
                                 action="permit",
                                 outputConstraints=outRoutes3b).answer().frame()
show(result)
Node Policy_Name Input_Route Action Output_Route Difference

There are no results for border2’s from_customer policy, so we have the strong assurance that it is properly setting the local preference on all permitted routes.

Example 4: Announce your own addresses out-bound

Ok, now let’s validate the to_provider route policies. The first thing we want to ensure is that they allow all our addresses to be advertised. Lets assume that these are addresses in the ranges 1.2.3.0/24:24-32 and 1.2.4.0/24:24-32. We can use searchRoutePolicies to validate this property. Specifically, we ask: Is there an announcement for an address that we own that is denied by ``to_provider``?

[8]:
# Verify that no route for our address space is ever denied by to_provider policies
ownedSpace=["1.2.3.0/24:24-32", "1.2.4.0/24:24-32"]
inRoutes4 = BgpRouteConstraints(prefix=ownedSpace)
result = bfq.searchRoutePolicies(policies="to_provider",
                                 inputConstraints=inRoutes4,
                                 action="deny").answer().frame()
show(result)
Node Policy_Name Input_Route Action Output_Route Difference

Since there are no results, this implies that no such announcement exists. Hurray!

Example 5: Announce your customers’ routes out-bound

Next we will check that the to_provider route policies are properly announcing our customers’ routes. These are identified by announcements that are tagged with the community 20:30, as we saw earlier.

[9]:
# Verify that no customer routes (i.e., those tagged with the community 20:30) is ever denied
customerCommunities = "20:30"
inRoutes5 = BgpRouteConstraints(communities=customerCommunities)
result = bfq.searchRoutePolicies(policies="to_provider",
                                 inputConstraints=inRoutes5,
                                 action="deny").answer().frame()
show(result)
Node Policy_Name Input_Route Action Output_Route Difference
0 border2 to_provider Network: 0.0.0.0/0
AS Path: []
Communities: [20:30]
Local Preference: 0
Metric: 0
Originator IP: 0.0.0.0
Origin Type: igp
Protocol: bgp
Source Protocol: None
DENY None None

There are no results for border1, so it permits announcement of all customer routes. But border2 has a bug since it denies some customer routes, a concrete example of which is shown in the Input_Route column.

A look at the configuration reveals that someone has fat-fingered the definition of the community list:

ip community-list cust_community permit 2:30

Such mistakes are difficult to find with any other tool.

Example 6: Don’t advertise anything else out-bound

Last, we want to make sure that our to_provider route policies don’t announce any routes other than the ones we own and the ones that our customers own. We will use searchRoutePolicies to ask: Is there a permitted route whose prefix is not one we own and which is not tagged with the community 20:30?

[10]:
# Set of routes that are neither in our owned space nor have the customer community (20:30)
inRoutes6 = BgpRouteConstraints(prefix=ownedSpace,
                                complementPrefix=True,
                                communities="!20:30")

# Verify that no such route is permitted
result = bfq.searchRoutePolicies(policies="to_provider",
                                 inputConstraints=inRoutes6,
                                 action="permit").answer().frame()
show(result)
Node Policy_Name Input_Route Action Output_Route Difference
0 border2 to_provider Network: 0.0.0.0/0
AS Path: []
Communities: [2:30]
Local Preference: 0
Metric: 0
Originator IP: 0.0.0.0
Origin Type: igp
Protocol: bgp
Source Protocol: None
PERMIT Network: 0.0.0.0/0
AS Path: []
Communities: [2:30]
Local Preference: 0
Metric: 0
Originator IP: 0.0.0.0
Origin Type: igp
Protocol: bgp
Source Protocol: None

The complementPrefix parameter above is used to indicate that we are interested in routes whose prefix is not in ownedSpace.

Since there are no results for border1 we can be sure that it is not advertising any routes that it shouldn’t be. We already saw in the previous example that border2 accidentally advertises routes tagged with 2:30, and that error shows up again here.

Current Status

The testRoutePolicies question supports all of the vendors and route-policy features that are supported by Batfish.

The searchRoutePolicies question has been (at the time of this writing) newly added to Batfish. It supports all of the vendors that are supported by Batfish. The question supports a host of common route policy behaviors and intents, as shown above, but it does not currently support all routing constructs. See its documentation for details, and feel free to reach out to us with questions or specific needs. We’ll continue to enhance its coverage.

Summary

In this notebook we showed you two ways to use Batfish to check whether your route policies meet your intent:
1. The testRoutePolicies question allows you to easily test the behavior of a route policy offline, without access to the live network.
2. The searchRoutePolicies question allows you to search for violations of intent, identifying concrete errors if they exist and providing strong correctness guarantees if not.

Get involved with the Batfish community

Join our community on Slack and GitHub.