There are certain super metrics that I create in most installations. Their purpose is to quickly tell me the health of the environment. The questions I like to be answered are:
- How well is my infra supporting all the VMs? I need the answer in terms of CPU, RAM, Disk and Network. So there are 4 separate answers needed. Just because the network is doing well does not mean the storage is. I’ve seen situation where ESXi host has plenty of RAM but not enough CPU.
- How well was my infra supporting all the VMs? Same question as above, but I need to be able to go back in time. Just because everything is well on Sunday midnite does not mean it’s well on Friday 10 am. I need the data in a line chart so I can see pattern. I’ve seen daily pattern and weekly pattern.
To answer the above, I needed to create 8 super metrics
- Maximum VM CPU Contention for all VMs
- Average VM CPU Contention for all VMs
- Maximum VM RAM Contention for all VMs
- Average VM RAM Contention for all VMs
- Maximum VM Disk Latency for all VMs
- Average VM Disk Latency for all VMs
- Maximum VM Network Dropped Packets for all VMs
- Average VM Network Dropped Packets for all VMs
I then apply those super metrics at cluster level. If the environment is huge with many clusters, I’d group the clusters based on their service tiers, and then apply the super metrics at the group level instead. The screenshot below shows the list of super metric that I have created. The one I selected shows it’s packaged inside a group package, as I apply it at the group level.
What does the formula look like? The screenshot below shows an example. This is a group level super metric, so I put “3” as a parameter. That means it has to go down 3 level from where it is being applied. I apply at a group level. The group is a collection of clusters. So the hierarchy looks like this: Group –> Cluster –> Host –> VM.
What about a metric that is applied at cluster level? Below is an example. In this example, I also show the preview feature. This is a very handy feature, as you can quickly test if your super metric makes sense or not. You can even look back in time. Notice the formula. There is something that is not required there. One of the metric is actually not useful. Can you tell which one? Hint: it’s about your vSphere knowledge, not vC Ops. Further hint: it’s about relationship between the same counter in VM and ESXi host.