I’ve decided to start studying for the CCNP Data Center. I’m quite familiar with the B-Series with plenty of hands on experience and previous CCIE Data Center studies, but other than some UC deployments I’ve not touched the C-Series much. One of the first questions that came up for me was, how does the CIMC (Cisco Integrated Management Controller, similar to HP iLO or Dell DRAC) level active/active redundancy work?
For those unfamiliar, within the CIMC configuration for a C200 M2 server we have the option to use a dedicated 10/100 management port, the two onboard 1Gbps LOM (LAN on Motherboard) NICs , or an optional P81E VIC (fancy interface virtualization card). With the LOM option we can select no NIC redundancy, active/passive, or active/active. The default configuration is LOM with active/active redundancy.
From the networking side active/active load balancing typically means static or LACP based link aggregation. This seemed to be an unlikely default since these methods require specific switch support and configuration. The C200 M2 install guide didn’t give any hints at all as to how the load balancing was achieved, so I decided to look at the newer C200 M3 documentation. Here I found the following explanation:
The active/active setting uses Mode 5 or Balance-TLB (adaptive transmit load balancing). This is channel bonding that does not require any special switch support. The outgoing traffic is distributed according to the current load (computed relative to the speed) on each slave. Incoming traffic is received by the current slave. If the receiving slave fails, another slave takes over the MAC address of the failed receiving slave.
So now we know that there aren’t any specific switch requirements, the load balancing is for server egress traffic only, and all ingress traffic is handled by a specific NIC. What we don’t know is which MAC address(es) are used on frame egress. If a single MAC address were used for all traffic it would cause a MAC flapping issues on the upstream switch or switches, so obviously this couldn’t be the case.
I had my theories, but in the absence of equipment to immediately test with I turned to Google. From my searching, it appears that this is a Linux specific type of load balancing and while there are many descriptions most say just about the same thing as Cisco’s description. I finally came across a serverfault question with a reasonable answer.
Essential what it says is that egress traffic uses the source MAC of the physical port, while ARP responses contain the MAC of the virtual aggregated interface. This avoids MAC flaps and makes sense. Mystery solved!
The only caveats I can think of to this type of load balancing is that, as stated, ingress traffic is not balanced, and some devices employ MAC lookup “optimizations.” NetApp storage arrays, for example, have a default function called IP Fastpath. This feature ignores the typical behavior of requesting a MAC via ARP for return traffic, instead simply swapping the source and destination MAC from the received traffic and sending it back out the same port. This behavior causes issues with some devices such as HSRP on Nexus switches or, as I once discovered, the routing functionality of Cisco’s SMB 500 stacking switches.
Since I don’t know how this load balancing method would handle response traffic destined to the physical MAC it’s impossible to say whether it would be problematic here. Perhaps it warrants further investigation, but direct communication between a NetApp controller and CIMC seem unlikely.
Note: The NetApp issue I described can be mitigated on Nexus by using the Peer Gateway feature or by disabling IP Fastpath on each NetApp controller (options ip.fastpath.enable off).