I have a bunch of servers connected to Nexus 5020s (running NX-OS 4.0(1a)N2(1)) constantly reporting CRC errors on their NICs. CRC errors are reported on the Nexuses’ egress ports, but ingress ports are clean. What’s happening is pretty interesting.
The N5k considers packets with an ‘invalid’ EtherType to be bad. If it were a store-and-forward switch, it would simply drop the packet on ingress but, since it’s a cut-through switch, there’s no opportunity to drop it before bits are already leaving the egress port. So what to do with a bad packet? When this occurs, the Nexus ’stamps’ the bad packet by overwriting whatever hasn’t been sent yet (with zeros, garbage, …? It’s also unclear to me whether it’s overwriting all of the remaining bits or simply the Ethernet checksum). Hence, the downstream device gets a corrupted packet and drops it (low enough in the stack that you can’t see it with tcpdump, unfortunately).
And the mystery packets are… VMWare ESX beacon probes. These are broadcast from ESX servers to monitor the availability of teamed vmnics. The Ethernet spec says that EtherType numbering starts at 0×8000. ESX beacon probes have an EtherType of 0×05FF. While I can understand the need to detect (and possibly drop) packets with known EtherTypes, I don’t see the logic behind dropping packets with unknown types.
Not sure which version this was fixed in (and I can’t find it in the Bug Toolkit), but switches running 4.1(3)N1(1) do not experience this problem.
