VM rightsizing

VM right-sizing is a commonly misunderstood term, because there are actually multiple use cases. It is not a one fits all, hence there are >1 formula. Here are 4 popular use cases:

Your App Team ask for extra vCPU. In this case, the hypervisor overhead is irrelevant. When you size NSX edge vCPU, you do not need to add extra CPU for the overhead. It’s done outside Linux.

You’re migrating a VM to an ESXi with 2x the speed. For example, from a 2 GHz ESXi to 4 GHz. All else being equal, you can cut down the VM size by 2. A 16 vCPU becomes 8. But you’re worried about causing queue inside the Guest OS.

You’re bulk migrating many VMs to another cluster, with no changes on their configuration. Consider 2 VMs. Both are running Windows Server 2019, have 64 vCPU. Both are running hot, but one of them is very heavy on IO. It sends a lot of network packets and doing lots of disk IOPS. The 2nd VM has a very different footprint on the ESXi. It’s much more demanding. All those IO processing need to be processed by other physical cores.

Your boss ask you to properly charge customers, accounting for what they are actually demanding. Would you charge the 2 VMs above the same way? You might for practical reason, quietly distributing the cost equally, but you know you’re not being fair 😉

You’re planning a tech refresh for Cluster X. It has 24 ESXi and 1000 VM. You are hoping to reduce infrastructure to 12 ESXi, hence you increase the CPU Speed and add cores per socket. Do you consider individual VM, or you do see how they behave as a group? Answer is the later, as 1000 VM will not peak at the same time. Do you consider what happens inside Windows or Linux, or do you see their footprint on your ESXi? Answer is later, as what happens inside is irrelevant.

From the above 5 use cases, there are at least 3 different formula:

  1. Guest OS Sizing. Excludes VM overhead, includes Guest OS Queue
  2. VM Sizing. Includes overhead, excludes Guest OS Queue
  3. VM Sizing. Includes overhead, includes Guest OS Queue

Before I give you the formula, we need to consider another dimension.

Performance vs Capacity

Sizing for capacity considers long term cycle. If there is a 1 minute spike to 100%, you won’t immediately adjust the CPU. On the other hand, troubleshooting a performance problem does not even care if performance was fine 1 minute ago. You are simply interested in the utilization at a point in time.

Sizing also considers buffer, just in case in future demand goes up. Performance does not care about what does not happen. It simply looks at fact (is there a performance problem? Yes/No).

Now that we’re ready, here is the first formula.

Guest OS Sizing for Capacity

The formula is

Run + Overlap + Ready + CoStop + IO Wait + Swap Wait + (Guest Run Queue)

Run is used because that’s the only counter not affected by both Power Management and Hyper Threading. We are sizing the Guest OS, not the VM. For Guest, we care about “how much you run” in a given period vs “how fast you run” when you’re running within that given period. Windows is still running at 100%, despite the fact the underlying VM has to settle for a lower clock speed or compete on a hyper-threading.

Usage, Demand and Use accounts for the efficiency of the run. These are applicable when sizing the VM, not sizing the Guest.

Overlap is added. You know this from your vSphere 101. If not, review this.

Ready, Co Stop, IO Wait and Swap Wait are added. Had there been no contention, Run would have been higher.

Guest OS CPU Run Queue is a counter inside the Guest, indicating processes are waiting in queue, to be executed. Had Windows or Linux have more vCPU, the queue would have been lower (all else being equal).

From the above number, plot them over time so you can include the peaks. Add headroom as you deem appropriate. Keep it minimal.

Project the above over time to arrive at a single number (on a single point in future).

Once you got a recommended number, adjust for NUMA. This naturally depends on the ESXi number of cores per socket.

2 more things…

As you can see from the above formula, that’s not what the Guest OS actual CPU utilization is. If you want to see what the Guest OS actually uses, then take the Guest OS CPU usage counter. IMHO, this counter has no purpose. You do not monitor utilization for the sake of utilization. You are either doing capacity or performance, and that drives the formula.

For performance, you need to consider 3 counters

Guest OS Usage + Context Switch + Run Queue

Yes, all the above counters are inside the Guest. You’re asking for Guest performance, not VM performance. They are related but not identical. VM performance has difference counters, such as Ready and Co-stop.

VM Sizing for Capacity

VM sizing differs to Guest OS sizing due to overhead (as explained above). All those IO processing are done 2x, once by Windows/Linux, and once by VMkernel.

CPU System counter accounts for this overhead. This is then charged at VM level, not individual vCPU. VMX should also be included, although it’s negligible most of the time.

Since we’re interested in the VM impact on the infrastructure, we need to consider CPU Frequency. This also enables comparison across ESXi with different speed. A 2-vCPU VM on a 4 GHz ESXi, may need 4 vCPU when moved into a 2 GHz ESXi.

HT is automatically accounted for. With lower efficiency, it will simply run longer. Instead of 40% for 5 minutes, it may run 90%. If it exceeds 100%, then it will run longer, and queue will develop inside the Guest OS.

The formula is similar, but we’re using Used instead of Run because Used accounts for CPU speed.

Used + Ready + CoStop + IO Wait + Swap Wait + (Guest Run Queue)

VM Sizing for bulk migration

In this use case, you do not care about what happens inside the Guest OS

Used + Ready + CoStop + IO Wait + Swap Wait

Hope that helps you. Let me know your finding in your production environment. Production is always an interesting place, lots of surprise and weird anomalies.

Leave a Reply