If you land into this Part 5 directly, review Part 1 first.
We will cover network in this blog. It applies to all tiers, as you should not have drop packets in any tier, and your network utilisation should in healthy ranges. As network is normally shared, it’s also easier to monitor per physical data center.
It’s coming to 2017. You should be on 10GE, as the chance of ESXi saturating 1 GE is not something you can ignore. The chance of ESXi saturating 2x 10 GE link is quite low, unless you run vSphere FT and VSAN (or other form of distributed storage)
- To help you monitor, you can create the following:
- A line chart showing the maximum network drop packets at the physical data center level. I use a physical data center as they eventually share the same core switches.
- A line chart showing the maximum and average ESXi vmnic at the same level as per above.
To recap, we need to create the following:
- A line chart showing the maximum network drop packets in the physical DC.
- A line chart showing the maximum and average ESXi vmnic utilization in the physical DC.
I use physical data center, not virtual data center. Can you guess why?
It’s easier to manage the network per physical data center. Unless your network is stretched, problems do not span across. Review this excellent article by Ivan, one of the networking industry authority.
The problem is how to choose ESXi from the same data center? It is possible for a physical data center to have multiple vCenter servers. On the other hand, it is also possible for vRealize Operations World object, or even a single vCenter, to span multiple physical data centers. So you need to manually determine the right object, so you get all the ESXi in that physical data center. For example, if you have 1 vRealize Operations managing 2 physical data centers, you definitely cannot use the World object. It will span across both data centers.
The screenshot below shows the super metric formula to get the maximum network drop packet at a vCenter data center object. Notice I use depth=3, as the data center object is 3 level above ESXi host object.
I did a preview of the super metric. As you can see above, it’s a flat line of 0. That’s what you should expect. No dropped packet at all from every host in your data center.
Dropped packet is much easier to track, as you expect 0 everywhere. Utilization is harder. If your ESXi has mix 10G and 1G vmnic, generally speaking you would expect the 10G to dominate the data. This is where consistent design & standard matter. Without it, you need to apply a different formula for different configuration of ESXi host.
Let’s look at the Maximum first, then Average. As I shared in this blog, you want to ensure that not a single vmnic is saturated. This means you need to track it at the vmnic level, not ESXi host level. Tracking at the ESXi Host level, as shown in the following screenshot, can hide the data at vmnic level. Take an example. Your ESXi has 8 x 1 Gb NIC. You are seeing a throughput of 4 Gbps. At the ESXi host level, it’s only 50% utilized. But that 4 Gbps is unlikely to be spread evenly. There is a potential at a vmnic is saturated, while others are hardly utilized.
As I shared in this blog, the super metric formula you need to copy-paste is
]) * 8 / 1024
The above is based on 4 vmnic per ESXi. If you have 2x 10 Gb, then you just need vmnic0 and vmnic1. If you have 6 vmnic, then you have to add vmnic4 and vmnic5.
The above will give you per ESXi host. You then need to apply it per physical data center. Please review this blog post.
Ok, the above will get us the maximum. We then apply the same approach for average. The great thing about taking the average at individual vmnic is you do not have to worry about how many vmnics an ESXi host has. If you use the data at the ESXi Host level, as shown in the screenshot below, you need to divide the number by the number of vmnics.
Once you have the Maximum and Average, you want to ensure that the Maximum is not near your physical limit, and the Average is showing a healthy utilization. A number near the physical limit means you have a risk of capacity. A number with low utilization means you over provisioned the hardware.
BTW, there is 1 physical NIC that is not monitored in the above. Can you guess which one?
Yes, it’s the iLO NIC. That does not show up as vmnic. Good thing is generally there is very little traffic there, and certainly no data traffic.
In the next post, I will cover Storage.