Pandas Examples

Batfish questions can return a huge amount of data, which you may want to filter in various ways based on your task. While most Batfish questions support basic filtering, they may not support your desired filtering criteria. Further, for performance, you may want to fetch the answer once and filter it using multiple different criteria. These scenarios are where Pandas-based filtering can help.

Batfish answers can be easily turned into a Pandas DataFrame (using .frame()), after which you can use the full power of Pandas to filter and manipulate data. This notebook provides a few examples of common manipulations for Batfish. It is not intended as a complete guide of Pandas data manipulation.

Let’s first initialize a snapshot that we will use in our examples.

[1]:
# Import packages
%run startup.py
bf = Session(host="localhost")

# Initialize a network and a snapshot
bf.set_network("pandas-example")

SNAPSHOT_NAME = "snapshot"
SNAPSHOT_PATH = "networks/hybrid-cloud/"
bf.init_snapshot(SNAPSHOT_PATH, name=SNAPSHOT_NAME, overwrite=True)
[1]:
'snapshot'

Filtering initIssues

After initializing the snapshot, you often want to look at the initIssues answer. If there are too many issues, you may want to ignore a particular class of issues. We show below how to do that.

[2]:
# Lets get the initIssues for our snapshot
issues = bf.q.initIssues().answer().frame()
issues
[2]:
Nodes Source_Lines Type Details Line_Text Parser_Context
0 ['leaf1'] None Convert warning (redflag) Interface Ethernet12 has an undefined channel group Port-Channel20 None None
1 None [configs/Leaf1.cfg:[6], configs/Leaf2.cfg:[6], configs/Leaf3.cfg:[6], configs/Leaf4.cfg:[6], configs/Spine1.cfg:[6], configs/Spine2.cfg:[6]] Parse warning This syntax is unrecognized transceiver qsfp default-mode 4x10G [arista_configuration]
2 None [aws_configs:[]] Parse warning (unimplemented) Unrecognized element 'ServiceDetails' in AWS file aws_configs/us-west-2/VpcEndpointServices.json None None
3 None [aws_configs:[]] Parse warning (unimplemented) Unrecognized element 'ServiceDetails' in AWS file aws_configs/us-east-2/VpcEndpointServices.json None None
[3]:
# Ignore all issues whose Line_Text contain one of these as a substring
line_texts_to_ignore = ["transceiver"]


def has_substring(text: Optional[str], substrings: List[str]) -> bool:
    """Returns True if 'text' is not None and contains one of the 'substrings'"""
    return text is not None and any(substr in text for substr in substrings)


issues[
    issues.apply(
        lambda issue: not has_substring(issue["Line_Text"], line_texts_to_ignore),
        axis=1,
    )
]
[3]:
Nodes Source_Lines Type Details Line_Text Parser_Context
0 ['leaf1'] None Convert warning (redflag) Interface Ethernet12 has an undefined channel group Port-Channel20 None None
2 None [aws_configs:[]] Parse warning (unimplemented) Unrecognized element 'ServiceDetails' in AWS file aws_configs/us-west-2/VpcEndpointServices.json None None
3 None [aws_configs:[]] Parse warning (unimplemented) Unrecognized element 'ServiceDetails' in AWS file aws_configs/us-east-2/VpcEndpointServices.json None None

In the code above, we are using the Pandas method apply to map issues to a binary array based on whether the issue has one of the substrings in line_texts_to_ignore. Passing axis=1 makes apply iterate over rows instead of columns. The helper method has_substring makes this determination. It returns True if text is not None and has any of the substrings. The Python method any returns True if any element of the input iterable is True. Using the binary array as a filter for issues produces rows that match our criterion.

Instead of ignoring some issues, you may want to focus on issues that match a certain criteria. That too can be easily accomplished, as follows.

[4]:
# Only show issues whose details match these substrings
focus_details = ["Unrecognized element 'ServiceDetails' in AWS"]

issues[
    issues.apply(lambda issue: has_substring(issue["Details"], focus_details), axis=1)
]
[4]:
Nodes Source_Lines Type Details Line_Text Parser_Context
2 None [aws_configs:[]] Parse warning (unimplemented) Unrecognized element 'ServiceDetails' in AWS file aws_configs/us-west-2/VpcEndpointServices.json None None
3 None [aws_configs:[]] Parse warning (unimplemented) Unrecognized element 'ServiceDetails' in AWS file aws_configs/us-east-2/VpcEndpointServices.json None None

The code above is similar to the one we used earlier, with the only differences being that we use the focus_details list as the argument to the has_substrings helper and we do not invert its result.

Filtering objects

`Line_Text` and `Details` columns above have string values, but many Batfish answers contain other data types as well. We generalize the approach above to filter other data types and to filter based on multiple columns. We use the `interfaceProperties` question for this demonstrate.
[5]:
# Fetch interface properties and display its first five rows
interfaces = bf.q.interfaceProperties().answer().frame()
interfaces.head(5)
[5]:
Interface Access_VLAN Active Admin_Up All_Prefixes Allowed_VLANs Auto_State_VLAN Bandwidth Blacklisted Channel_Group Channel_Group_Members DHCP_Relay_Addresses Declared_Names Description Encapsulation_VLAN HSRP_Groups HSRP_Version Inactive_Reason Incoming_Filter_Name MLAG_ID MTU Native_VLAN Outgoing_Filter_Name PBR_Policy_Name Primary_Address Primary_Network Proxy_ARP Rip_Enabled Rip_Passive Spanning_Tree_Portfast Speed Switchport Switchport_Mode Switchport_Trunk_Encapsulation VRF VRRP_Groups Zone_Name
0 __aws-services-gateway__[aws-services] None True True [] True 1e+12 False None [] [] [] To AWS services None [] None None None 1500 None None None link-local:169.254.0.1 None False False False False None False NONE DOT1Q default [] None
1 __aws-services-gateway__[backbone] None True True [] True 1e+12 False None [] [] [] To AWS backbone None [] None None None 1500 None None None link-local:169.254.0.1 None False False False False None False NONE DOT1Q default [] None
2 exitgw[GigabitEthernet1] None True True ['10.10.100.2/24'] True 1e+09 False None [] [] ['GigabitEthernet1'] None None [] None None None 1500 None None None 10.10.100.2/24 10.10.100.0/24 True False False False 1e+09 False NONE DOT1Q default [] None
3 exitgw[GigabitEthernet2] None True True ['10.10.101.2/24'] True 1e+09 False None [] [] ['GigabitEthernet2'] None None [] None None None 1500 None None None 10.10.101.2/24 10.10.101.0/24 True False False False 1e+09 False NONE DOT1Q default [] None
4 exitgw[GigabitEthernet3] None True True ['147.75.69.27/31'] True 1e+09 False None [] [] ['GigabitEthernet3'] None None [] None None None 1500 None None None 147.75.69.27/31 147.75.69.26/31 True False False False 1e+09 False NONE DOT1Q default [] None

To filter based on a column, we need to know its data type. We can learn that in the Batfish documentation or by inspecting the answer we got from Batfish (e.g., using Python’s type() method).

We show three examples of filtering based on the Interface and Active columns, which are of type pybatfish.datamodel.primitives.Interface and bool, respectively. The former has hostname and interface properties (which are strings).

[6]:
# Display all interfaces on node 'exitgw'
interfaces[interfaces.apply(lambda row: row["Interface"].hostname == "exitgw", axis=1)]
[6]:
Interface Access_VLAN Active Admin_Up All_Prefixes Allowed_VLANs Auto_State_VLAN Bandwidth Blacklisted Channel_Group Channel_Group_Members DHCP_Relay_Addresses Declared_Names Description Encapsulation_VLAN HSRP_Groups HSRP_Version Inactive_Reason Incoming_Filter_Name MLAG_ID MTU Native_VLAN Outgoing_Filter_Name PBR_Policy_Name Primary_Address Primary_Network Proxy_ARP Rip_Enabled Rip_Passive Spanning_Tree_Portfast Speed Switchport Switchport_Mode Switchport_Trunk_Encapsulation VRF VRRP_Groups Zone_Name
2 exitgw[GigabitEthernet1] None True True ['10.10.100.2/24'] True 1e+09 False None [] [] ['GigabitEthernet1'] None None [] None None None 1500 None None None 10.10.100.2/24 10.10.100.0/24 True False False False 1e+09 False NONE DOT1Q default [] None
3 exitgw[GigabitEthernet2] None True True ['10.10.101.2/24'] True 1e+09 False None [] [] ['GigabitEthernet2'] None None [] None None None 1500 None None None 10.10.101.2/24 10.10.101.0/24 True False False False 1e+09 False NONE DOT1Q default [] None
4 exitgw[GigabitEthernet3] None True True ['147.75.69.27/31'] True 1e+09 False None [] [] ['GigabitEthernet3'] None None [] None None None 1500 None None None 147.75.69.27/31 147.75.69.26/31 True False False False 1e+09 False NONE DOT1Q default [] None
5 exitgw[GigabitEthernet4] None False False [] True 1e+09 False None [] [] ['GigabitEthernet4'] None None [] None Administratively down None None 1500 None None None None None True False False False 1e+09 False NONE DOT1Q default [] None
6 exitgw[Loopback0] None True True ['2.2.2.2/32'] True 8e+09 None None [] [] ['Loopback0'] None None [] None None None 1500 None None None 2.2.2.2/32 2.2.2.2/32 True False False False None False NONE DOT1Q default [] None
7 exitgw[Loopback123] None True True ['192.168.123.7/32'] True 8e+09 None None [] [] ['Loopback123'] None None [] None None None 1500 None None None 192.168.123.7/32 192.168.123.7/32 True False False False None False NONE DOT1Q default [] None
8 exitgw[Tunnel1] None True True ['169.254.25.162/30'] True 100000 None None [] [] ['Tunnel1'] None None [] None None None 1500 None None None 169.254.25.162/30 169.254.25.160/30 True False False False None False NONE DOT1Q default [] None
9 exitgw[Tunnel2] None True True ['169.254.172.2/30'] True 100000 None None [] [] ['Tunnel2'] None None [] None None None 1500 None None None 169.254.172.2/30 169.254.172.0/30 True False False False None False NONE DOT1Q default [] None
10 exitgw[Tunnel3] None True True ['169.254.252.78/30'] True 100000 None None [] [] ['Tunnel3'] None None [] None None None 1500 None None None 169.254.252.78/30 169.254.252.76/30 True False False False None False NONE DOT1Q default [] None
11 exitgw[Tunnel4] None True True ['169.254.215.82/30'] True 100000 None None [] [] ['Tunnel4'] None None [] None None None 1500 None None None 169.254.215.82/30 169.254.215.80/30 True False False False None False NONE DOT1Q default [] None
[7]:
# Display all GigabitEthernet interfaces on node 'exitgw'
interfaces[
    interfaces.apply(
        lambda row: row["Interface"].hostname == "exitgw"
        and row["Interface"].interface.startswith("GigabitEthernet"),
        axis=1,
    )
]
[7]:
Interface Access_VLAN Active Admin_Up All_Prefixes Allowed_VLANs Auto_State_VLAN Bandwidth Blacklisted Channel_Group Channel_Group_Members DHCP_Relay_Addresses Declared_Names Description Encapsulation_VLAN HSRP_Groups HSRP_Version Inactive_Reason Incoming_Filter_Name MLAG_ID MTU Native_VLAN Outgoing_Filter_Name PBR_Policy_Name Primary_Address Primary_Network Proxy_ARP Rip_Enabled Rip_Passive Spanning_Tree_Portfast Speed Switchport Switchport_Mode Switchport_Trunk_Encapsulation VRF VRRP_Groups Zone_Name
2 exitgw[GigabitEthernet1] None True True ['10.10.100.2/24'] True 1e+09 False None [] [] ['GigabitEthernet1'] None None [] None None None 1500 None None None 10.10.100.2/24 10.10.100.0/24 True False False False 1e+09 False NONE DOT1Q default [] None
3 exitgw[GigabitEthernet2] None True True ['10.10.101.2/24'] True 1e+09 False None [] [] ['GigabitEthernet2'] None None [] None None None 1500 None None None 10.10.101.2/24 10.10.101.0/24 True False False False 1e+09 False NONE DOT1Q default [] None
4 exitgw[GigabitEthernet3] None True True ['147.75.69.27/31'] True 1e+09 False None [] [] ['GigabitEthernet3'] None None [] None None None 1500 None None None 147.75.69.27/31 147.75.69.26/31 True False False False 1e+09 False NONE DOT1Q default [] None
5 exitgw[GigabitEthernet4] None False False [] True 1e+09 False None [] [] ['GigabitEthernet4'] None None [] None Administratively down None None 1500 None None None None None True False False False 1e+09 False NONE DOT1Q default [] None
[8]:
# Display all active GigabitEthernet interfaces on node 'exitgw'
interfaces[
    interfaces.apply(
        lambda row: row["Interface"].hostname == "exitgw"
        and row["Interface"].interface.startswith("GigabitEthernet")
        and row["Active"],
        axis=1,
    )
]
[8]:
Interface Access_VLAN Active Admin_Up All_Prefixes Allowed_VLANs Auto_State_VLAN Bandwidth Blacklisted Channel_Group Channel_Group_Members DHCP_Relay_Addresses Declared_Names Description Encapsulation_VLAN HSRP_Groups HSRP_Version Inactive_Reason Incoming_Filter_Name MLAG_ID MTU Native_VLAN Outgoing_Filter_Name PBR_Policy_Name Primary_Address Primary_Network Proxy_ARP Rip_Enabled Rip_Passive Spanning_Tree_Portfast Speed Switchport Switchport_Mode Switchport_Trunk_Encapsulation VRF VRRP_Groups Zone_Name
2 exitgw[GigabitEthernet1] None True True ['10.10.100.2/24'] True 1e+09 False None [] [] ['GigabitEthernet1'] None None [] None None None 1500 None None None 10.10.100.2/24 10.10.100.0/24 True False False False 1e+09 False NONE DOT1Q default [] None
3 exitgw[GigabitEthernet2] None True True ['10.10.101.2/24'] True 1e+09 False None [] [] ['GigabitEthernet2'] None None [] None None None 1500 None None None 10.10.101.2/24 10.10.101.0/24 True False False False 1e+09 False NONE DOT1Q default [] None
4 exitgw[GigabitEthernet3] None True True ['147.75.69.27/31'] True 1e+09 False None [] [] ['GigabitEthernet3'] None None [] None None None 1500 None None None 147.75.69.27/31 147.75.69.26/31 True False False False 1e+09 False NONE DOT1Q default [] None

Filtering columns

When viewing Batfish answers, you may want to view only some of the columns. Pandas makes that easy for both original answers and answers where some rows have been filtered, as both of them are just DataFrames.

[9]:
# Filter interfaces to all active GigabitEthernet interfaces on node exitgw
exitgw_gige_active_interfaces = interfaces[
    interfaces.apply(
        lambda row: row["Interface"].hostname == "exitgw"
        and row["Interface"].interface.startswith("GigabitEthernet")
        and row["Active"],
        axis=1,
    )
]
# Display only the Interface and All_Prefixes columns of the filtered DataFrame
exitgw_gige_active_interfaces[["Interface", "All_Prefixes"]]
[9]:
Interface All_Prefixes
2 exitgw[GigabitEthernet1] ['10.10.100.2/24']
3 exitgw[GigabitEthernet2] ['10.10.101.2/24']
4 exitgw[GigabitEthernet3] ['147.75.69.27/31']

Counting rows

Often, you would be interested in counting the number of rows in the filtered answer. This is super easy because Python’s len() method, which we use for iterables, can be used on DataFrames as well.

[10]:
# Show the number of rows in the filtered DataFrame that we obtained above
len(exitgw_gige_active_interfaces)
[10]:
3

Grouping rows

For more advanced operations than filtering rows and columns, chances are that you will find Pandas groupyby pretty handy. This method enables you to group rows using a custom criteria and analyze those groups. For instance, if you wanted to group interfaces by nodes, you may do the following:

[11]:
# Get interfaces grouped by node name
intefaces_by_hostname = interfaces.groupby(
    lambda index: interfaces.loc[index]["Interface"].hostname
)

We obtained a Pandas DataFrameGroupBy object above. The groupby method iterates over row indexes (apply iterated over rows), calls the lambda over each, and groups rows whose indices yield the same value. In our example, the lambda first gets the row using interfaces.loc[index], then gets the interface (which is of type pybatfish.datamodel.primitives.Interface), and finally the hostname.

DataFrameGroupBy objects offer many functions that are useful for analysis. We demonstrate two of them below.

[12]:
# Display the rows corresponding to node 'exitgw' group
intefaces_by_hostname.get_group("exitgw")
[12]:
Interface Access_VLAN Active Admin_Up All_Prefixes Allowed_VLANs Auto_State_VLAN Bandwidth Blacklisted Channel_Group Channel_Group_Members DHCP_Relay_Addresses Declared_Names Description Encapsulation_VLAN HSRP_Groups HSRP_Version Inactive_Reason Incoming_Filter_Name MLAG_ID MTU Native_VLAN Outgoing_Filter_Name PBR_Policy_Name Primary_Address Primary_Network Proxy_ARP Rip_Enabled Rip_Passive Spanning_Tree_Portfast Speed Switchport Switchport_Mode Switchport_Trunk_Encapsulation VRF VRRP_Groups Zone_Name
2 exitgw[GigabitEthernet1] None True True ['10.10.100.2/24'] True 1e+09 False None [] [] ['GigabitEthernet1'] None None [] None None None 1500 None None None 10.10.100.2/24 10.10.100.0/24 True False False False 1e+09 False NONE DOT1Q default [] None
3 exitgw[GigabitEthernet2] None True True ['10.10.101.2/24'] True 1e+09 False None [] [] ['GigabitEthernet2'] None None [] None None None 1500 None None None 10.10.101.2/24 10.10.101.0/24 True False False False 1e+09 False NONE DOT1Q default [] None
4 exitgw[GigabitEthernet3] None True True ['147.75.69.27/31'] True 1e+09 False None [] [] ['GigabitEthernet3'] None None [] None None None 1500 None None None 147.75.69.27/31 147.75.69.26/31 True False False False 1e+09 False NONE DOT1Q default [] None
5 exitgw[GigabitEthernet4] None False False [] True 1e+09 False None [] [] ['GigabitEthernet4'] None None [] None Administratively down None None 1500 None None None None None True False False False 1e+09 False NONE DOT1Q default [] None
6 exitgw[Loopback0] None True True ['2.2.2.2/32'] True 8e+09 None None [] [] ['Loopback0'] None None [] None None None 1500 None None None 2.2.2.2/32 2.2.2.2/32 True False False False None False NONE DOT1Q default [] None
7 exitgw[Loopback123] None True True ['192.168.123.7/32'] True 8e+09 None None [] [] ['Loopback123'] None None [] None None None 1500 None None None 192.168.123.7/32 192.168.123.7/32 True False False False None False NONE DOT1Q default [] None
8 exitgw[Tunnel1] None True True ['169.254.25.162/30'] True 100000 None None [] [] ['Tunnel1'] None None [] None None None 1500 None None None 169.254.25.162/30 169.254.25.160/30 True False False False None False NONE DOT1Q default [] None
9 exitgw[Tunnel2] None True True ['169.254.172.2/30'] True 100000 None None [] [] ['Tunnel2'] None None [] None None None 1500 None None None 169.254.172.2/30 169.254.172.0/30 True False False False None False NONE DOT1Q default [] None
10 exitgw[Tunnel3] None True True ['169.254.252.78/30'] True 100000 None None [] [] ['Tunnel3'] None None [] None None None 1500 None None None 169.254.252.78/30 169.254.252.76/30 True False False False None False NONE DOT1Q default [] None
11 exitgw[Tunnel4] None True True ['169.254.215.82/30'] True 100000 None None [] [] ['Tunnel4'] None None [] None None None 1500 None None None 169.254.215.82/30 169.254.215.80/30 True False False False None False NONE DOT1Q default [] None

Here, we used the get_group method to get all information for ‘exitgw’, thus viewing all interfaces for that node. This is possible using row filtering as well, but we can do other things that are not, such as:

[13]:
# Display the number of interfaces per node
intefaces_by_hostname.count()[["Interface"]]
[13]:
Interface
__aws-services-gateway__ 2
exitgw 10
i-01602d9efaed4409a 1
i-02cae6eaa9edeed70 1
i-04cd3db5124a05ee6 1
i-0a5d64b8b58c6dd09 1
igw-02fd68f94367a67c7 2
igw-0a8309f3192e7cea3 2
internet 3
isp_16509 6
isp_65200 2
leaf1 15
leaf2 13
leaf3 13
leaf4 14
spine1 18
spine2 18
srv-101 1
subnet-009d57c7f13813630 4
subnet-0333a0749ea4ce3df 4
subnet-03acae3b9a534fff9 3
subnet-06005943afe32f714 4
subnet-06a692ed4ef84368d 4
subnet-09b389def558a9c7d 3
subnet-0cb5f4c094bee5214 3
subnet-0f84a4be105f7aaef 3
tgw-06b348adabd13452d 8
tgw-0888a76c8a371246d 8
vpc-00157b5941bfd4959 5
vpc-00b65e98077106059 8
vpc-0276455718806058a 5
vpc-0574d08f8d05917e4 8

In this example, we used the count method, which counts non-null entries for each column in the group. We then filtered by the Interface column to see interfaces per node.

Summary

In this notebook, we showed how you can use Pandas methods to manipulate Batfish answers, including filtering rows, filtering columns, and grouping rows. Hopefully, these examples help you get started with your analyses. Find us on Slack (link below) if you have questions.


Get involved with the Batfish community

Join our community on Slack and GitHub.