- MTU
- Unicast Routing Protocol
- IP addressing
- Multicast for BUM traffic replication
VxLAN adds 50 Bytes to the original Ethernet frame which needs to be catered for to avoid fragmentation. The simplest way of doing this is to enable Jumbo frames in the IP network where VxLAN will run. As most servers utilise a jumbo frame of 9000 it is recommended that the switches be configured with a Jumbo frame of 9192 / 9216 depending on what the model of hardware supports. This will cater for the servers 9000 plus the VxLAN overhead.
The next consideration is which IGP (unicast routing protocol) to utilise, however as mentioned this post will focus on OSPF.
IP addressing for the underlay needs to cater for the P2P links between the spine and leaf switches, the loopback interfaces on each spine and leaf switch and the multicast Rendezvous-Point (RP) address.
Whilst discussed in more detail later in this post it should be noted that the mode of multicast utilised will likely depend on the model of hardware which is being utilised. For example on the Cisco Nexus range, unfortunately, not all Nexus models support the same multicast mode. Below is a list of what is supported on each Nexus model:
- Nexus 1000v – IGMP v2/v3
- Nexus 3000 – PIM ASM
- Nexus 5600 – PIM BiDir
- Nexus 7000/F3 – PIM ASM / PIM BiDir
- Nexus 9000 – PIM ASM
In this example we will leverage the loopback address for our multicast RP address, however as an example for a medium sized spine and leaf deployment utilising 4 spine switches and 20 leaf switches the following IP address usage needs to be considered:
- 4 Spine x 20 leaf = 80 P2P Links
- 80 links, with an IP address at each end = 160 P2P IP addresses
- 24 devices in total = 24 Lookpack IP addresses.
- Total = 160 P2P IP + 24 Loopback IP = 184 IP Addresses
Also note that to conserve IP addresses, ‘IP unnumbered loopback 0’ for the P2P interfaces, may be used, which means 1 IP address per device. This should be seriously considered for large deployment, however for simplicity in this example I am going to utilise 2 Spine switches and 3 Leaf switches and thus a unique IP address everywhere, meaning I need to cater for:
2 Spine x 3 leaf = 6 P2P links x 2 = 12 P2P IP addresses + 6 Loopback IP addresses.
Also I am going to assume that in this example that the servers are utilising the 10/8 IP address range, and thus I have opted to use the 192.168/16 range for the Loopback interfaces which are also used as the Router ID and 172.16/12 IP address range for the physical layer 3 P2P interfaces.
Also for reference whilst most of the thoery is independant of the vendor and hardware in this example I am using Cisco Nexus 9000 switches to implement this network technology, and as with all Nexus switches the features first need to be enabled, thus I have enabled the following:
Spine-1#show run | incl feature
feature nxapi
feature ospf
feature bgp
feature pim
feature interface-vlan
feature vn-segment-vlan-based
feature lacp
feature lldp
feature nv overlay
ip pim rp-address 192.168.1.0ip pim anycast-rp 192.168.1.0 192.168.1.1ip pim anycast-rp 192.168.1.0 192.168.1.2
interface loopback0
description Router-ID – Spine1
ip address 192.168.1.1/32router ospf UNDERLAY
router-id 192.168.1.1
log-adjacency-changes
maximum-paths 12
auto-cost reference-bandwidth 100000 Mbps
passive-interface default
The router-id is the IP I will use for the loopback0 interface and for all router-id’s defined on this switch.
The OSPF configuration is standard and should be familiar to anyone who has configured OSPF before, however the command ‘maximum paths’ may not be. This is enabled to provide Equal Cost Multi-Pathing between my leaf and spine switches. I chosen 12 just to have a large number and likely never need to worry about it again, but as long as this is equal to, or greater than, the amount of physical links it will be fine. Also it is always good practice to define the reference bandwidth, and in this example I have configured 100000 Mbps which is 100 Gbps and should cater for the largest link this environment will have. Also I prefer to manually nominate any interfaces I wish to participate in OSPF thus I have configured the interfaces to be passive by default.
Once this is done I can go back into the loopback interface and assign the OSPF and Mulicast parameters so the loopback interface participates in these protocols, with the following configuration:
interface loopback0description Router-ID – Spine1ip address 192.168.1.1/32ip ospf network point-to-pointip router ospf UNDERLAY area 0.0.0.0ip pim sparse-mode
interface Ethernet1/43description – DC01-LSL06-03 [Eth1/47]mtu 9216ip address 172.16.1.1/30ip ospf network point-to-pointno ip ospf passive-interfaceip router ospf UNDERLAY area 0.0.0.0ip pim sparse-modeno shutdown
Its important to configure OSPF as point to point here to ensure there is no DR/BDR and thus no election as well as being a more optimised LSA database, and avoiding a full SPF calculation for a link failure. Also as we have nominated passive-interface default in OSPF we need to enable this interface to participate in OSPF with the command ‘no ip ospf passive-interface’. I have also used a /30 for the point to point link which is not ideal for preserving IP address space and may cause scale issues in a very large deployment but for simplicity of configuration and troubleshooting I’ve decided the trade of here is fine.
All the interconnects between the leaf and spine switches are via 2 x 10G interfaces thus I need to replicate the above configuration on an additional interface as per the following configuration.
interface Ethernet1/44description – DC01-LSL06-03 [Eth1/48]mtu 9216ip address 172.16.1.5/30ip ospf network point-to-pointno ip ospf passive-interfaceip router ospf UNDERLAY area 0.0.0.0ip pim sparse-modeno shutdown
Spine-1# show ip ospf neighborsOSPF Process ID UNDERLAY VRF defaultTotal number of neighbors: 6Neighbor ID Pri State Up Time Address Interface192.168.1.13 1 FULL/ – 1w5d 172.16.1.2 Eth1/43192.168.1.13 1 FULL/ – 1w5d 172.16.1.6 Eth1/44192.168.1.12 1 FULL/ – 1w5d 172.16.1.10 Eth1/45192.168.1.12 1 FULL/ – 1w5d 172.16.1.14 Eth1/46192.168.1.11 1 FULL/ – 1w5d 172.16.1.18 Eth1/47192.168.1.11 1 FULL/ – 1w5d 172.16.1.22 Eth1/48
Spine-1# show ip pim neighborPIM Neighbor Status for VRF “default”Neighbor Interface Uptime Expires DR Bidir- BFDPriority Capable State172.16.1.2 Ethernet1/43 1w5d 00:01:42 1 yes n/a172.16.1.6 Ethernet1/44 1w5d 00:01:35 1 yes n/a172.16.1.10 Ethernet1/45 1w5d 00:01:26 1 yes n/a172.16.1.14 Ethernet1/46 1w5d 00:01:23 1 yes n/a172.16.1.18 Ethernet1/47 1w5d 00:01:34 1 yes n/a172.16.1.22 Ethernet1/48 1w5d 00:01:44 1 yes n/a
Spine-1# show ip pim interface briefPIM Interface Status for VRF “default”Interface IP Address PIM DR Address Neighbor BorderCount InterfaceEthernet1/43 172.16.1.1 172.16.1.2 1 noEthernet1/44 172.16.1.5 172.16.1.6 1 noEthernet1/45 172.16.1.9 172.16.1.10 1 noEthernet1/46 172.16.1.13 172.16.1.14 1 noEthernet1/47 172.16.1.17 172.16.1.18 1 noEthernet1/48 172.16.1.21 172.16.1.22 1 noloopback0 192.168.1.1 192.168.1.1 0 no
Note: As this example is from the spine switch and each spine has 2 x 10G links to the 3 x leaf switches, there are 6 entries plus the loopback (depending on which command used) above.
This now has formed the underlay network with OSPF and Multicast and we can now build the overlay and control plane network above this. It is critical that reachability of the underlay is consistent across the fabric and this may be a good point to test failure scenarios for the underlay. It is a good point however to finish this blog, with the next providing the overlay and control plane configuration details.
