Host Settings
This page contains settings for Linux servers, these instructions are agnostic of the Linux
distribution used by the servers. The commands use the ip
utility from the iproute2
package.
These settings do not persist across reboots. Consult the documentation from
the Linux distribution for guides on how to persist the settings used on the
server. For example, Netplan or Network Manager. For
additional details on options and behavior, consult the kernel bonding driver documentation.
MCLAG / ESLAG
The multi-chassis LAG architecture is a way to provide device redundancy in a network architecture. At the physical layer, an MCLAG topology is a single server connected to two different switches and, those switches are directly connected to each other in addition to being connected to the rest of the fabric.
ESLAG is a similar technology to MCLAG, with the beneficial difference that the switches do not need to be directly connected to each other. There can be up to 4 switches in an ESLAG group, whereas MCLAG is always two switches.
Regardless of whether MCLAG or ESLAG is chosen, the host must configure its two (or more) ports using LACP (IEEE 802.3ad).
Server Settings
The supported server-side setting for bonding is 802.3ad
, with hashing policy
layer2
or layer2+3
. layer2+3
transmit hashing is the recommended
setting for the overwhelming majority of scenarios.
To configure this in Linux:
sudo ip link add $bond_name type bond miimon 50 mode 802.3ad layer2+3
sudo ip link set $slave1_name master $bond_name
sudo ip link set $slave2_name master $bond_name
The miimon 100
indicates that the bonding driver will use miimon
to
determine if the link is up
, and it will poll every 50 milliseconds.
layer2
or layer2+3
is the hashing mode, which controls the input into the
hash function. See the kernel bonding driver documentation for
additional details.
After the bond has been created, it can receive a DHCP address or be statically assigned, for example:
To check the status of this bond:
LAG, Bonding, Teaming, Bundling
A LAG configuration can be used to increase the bandwidth to and from an end host. For example, if a server has two 25 Gbps ports, the ports can be bonded so they will have an aggregate 50 Gbps of available bandwidth. At the physical layer, two cables are going from the server to a single switch.
To enable fast detection of faults, 802.3ad
is the supported mode for bonds
attaching to the Open Network Fabric. The steps to configure the bond are the
same as the above server settings
The kernel bonding driver documentation discusses other modes for bonding. Several of those modes don't require cooperation on the switch side, so there is no fabric configuration needed. The primary concerns for using these modes are out-of-order TCP delivery, fault detection, and correction time.
Active / Passive Fault Tolerance
Some networking equipment doesn't support MCLAG or ESLAG. In this case, to
enable fault-tolerant availability, configure a bond on the Linux server in
active / standby mode. In this topology, a link is connected from
server-1 to switch-1, and another link is connected from server-1 to switch-2.
There is neither MCLAG connection nor switch
group configuration applied to the
switches. When adding these links to the fabric via the kubectl
command or
the wiring diagram, use the unbundled connection
kind.
Active / Passive Bond Settings
The IP addresses used in the following commands are examples. On the host:
- Create a bond:
Print the link layer information for the interfaces to see that the same MAC address is on all 3 interfaces: bond0, enp2s1,enp2s2:
core@server-1 ~ $ ip link
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: enp2s1: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq master bond0 state UP mode DEFAULT group default qlen 1000
link/ether a6:eb:64:de:95:92 brd ff:ff:ff:ff:ff:ff permaddr a0:36:9f:3f:1a:e8
3: enp2s2: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq master bond0 state UP mode DEFAULT group default qlen 1000
link/ether a6:eb:64:de:95:92 brd ff:ff:ff:ff:ff:ff permaddr 14:02:ec:7f:3a:3c
4: enp2s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP mode DEFAULT group default qlen 1000
link/ether 0c:20:12:fe:01:00 brd ff:ff:ff:ff:ff:ff
10: bond0: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
link/ether a6:eb:64:de:95:92 brd ff:ff:ff:ff:ff:ff
13: bond0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
link/ether a6:eb:64:de:95:92 brd ff:ff:ff:ff:ff:ff
Confirm that the bonding driver is reporting the selected primary interface:
core@server-1 ~ $ cat /proc/net/bonding/bond0
Ethernet Channel Bonding Driver: v6.6.74-flatcar
Bonding Mode: fault-tolerance (active-backup)
Primary Slave: enp2s1 (primary_reselect always)
Currently Active Slave: enp2s1
MII Status: up
MII Polling Interval (ms): 50
Up Delay (ms): 0
Down Delay (ms): 0
Peer Notification Delay (ms): 0
Slave Interface: enp2s1
MII Status: up
Speed: 10000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: a0:36:9f:3f:1a:e8
Slave queue ID: 0
Slave Interface: enp2s2
MII Status: up
Speed: 10000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: 14:02:ec:7f:3a:3c
Slave queue ID: 0
Failure Scenarios
When miimon
detects that the primary member interface is no longer up
, the bonding
driver will send five gratuitous ARP messages via the secondary member interface to notify the switch
that the IP address is now on a new switch. The fabric will start routing
traffic to this location, and the host will start receiving traffic.
If the miimon
polling determines that the primary member interface link is
up
, then the bonding driver will switch back to the primary member interface.
The primary_reselect
behavior is configurable, see the kernel bonding driver documentation
for additional details.
The changeover process will result in dropped packets, but it should not result in TCP timeouts.
VLANs
A Linux host is capable of receiving tagged and untagged traffic. Tagged traffic is traffic that has the VLAN (802.1q) header intact, untagged traffic has the VLAN header removed as it exits the switch.
The Open Network Fabric is capable of emitting tagged and untagged traffic. The
setting to change whether traffic is tagged or untagged is on the
VPCAttahcment, specifically the nativeVLAN
field.
The default value of this field is false
, meaning the fabric will emit tagged
traffic. If a port emits tagged traffic, it means that the Linux host must handle
the tag. This is accomplished by creating a link layer interface on the host for the tag:
enp2s1.1001
is by convention; it can be customized. The vlan
in
this example is 1001
and was created when the subnet was created inside the
VPC.
When the nativeVLAN
field is true
, the switch will remove the 802.1q tag
from the packet as it exits the switch. When a Linux server is attached to this
port, there is no need to create an additional link layer device.
Multiple VLANs on a port
At times, it is desirable to configure a single port that emits more than one VLAN tag, which is called VLAN trunking. To create a trunk port, attach the VPC subnet to the desired port, multiple VPC subnets can be connected to a single connection object: